[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Hangul and IDN (was Re: [idn] reordering strawpoll)
----- Original Message -----
From: "Kent Karlsson" <kentk@md.chalmers.se>
>
> The precomposed Hangul syllable U+AE4C (GGA) is canonically equivalent
> with <U+1101, U+1161> (GG, A), through algorithmic decomposition. That is fine
> so far. But <U+1101, U+1161> is in turn equivalent to <U+1100, U+1100, U+1161>
> (G, G, A),
> but this equivalence is neither a canonical equivalence, as it
> should have been, nor a compatibility equivalence. Still, the latter letter
> sequence represents EXACTLY the same syllable as the two earlier character
> sequences, and a proper rendering engine (of which there are already some,
> I'm told) would correctly render the three sequences in the same way.
Are you raising this possiblity: U+1101 <---> U+1100 U+1000 (GG <-> G G) ?
Design of conjoining (cluster) jamos treat two choseong sequence
U+1100 U+1000 as illegal sequence (syllable break condition). That is described somewhere in Unicode 3, chapter 3,section 11. That will help you.
if some rendering engine display the two as the same syllable,
I suspect the product is buggy or beyond the standard. :-)
Would you tell me the version of the rendering engine ?
> But for historical reasons, there is now neither a canonical, nor a compatibility
> equivalence there. Just an equivalence, in the same script, in syllabic meaning
> and (when properly implemented) in display. (Yes, G and GG are pronounced
> differently, but this is about spelling.)
>
> This is something that 'nameprep' should handle, since it is unfortunately not
> handled by NFKC. The logical steps would be to 1) algorithmically decompose
> Hangul syllables, 2) map cluster Jamos to the basic letter sequences each
> represent. Then either (design decision) invoke NFKC or NFKC augmented
> to compose also "modern" cluster Jamo's before the part of NFKC formation
> that does algorithmic composition of Hangul syllables (the historic cluster
> Jamos can (design decision) stay decomposed). Or, indeed, do the
> decomposition into basic (i.e. non-cluster) Hangul Jamo letters, after
> conversion to NFKC form, leaving Hangul "subnames" as sequences of
> letter characters, just like for other alphabetic scripts (I don't know how this
> would effect the length of ACE encoded IDN names). (Some thought
> needs to go into how ((Halfwidth)) Compatibility Hangul Letters are to be
> handled. The compatibility mapping are, ahem, not fully appropriate...
> The Hangul "filler" characters are also a problem, which needs to be
> considered.)
After reading unicode 3 3.11 section, would you rephrase this question
again in this list? maybe you will have different idea on the real hangul
jamo processing.
Sorry for misreading your posting. I have been hurrying on... :-)
Cheers,
Thanks.
Soobok lee
>
> Kind regards
> /kent k
>
>
>