[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Hangul and IDN (was Re: [idn] reordering strawpoll)



Hi!

>> The precomposed Hangul syllable U+AE4C (GGA) is canonically equivalent
>> with <U+1101, U+1161> (GG, A), through algorithmic decomposition. That is fine
>> so far. But <U+1101, U+1161> is in turn equivalent to <U+1100, U+1100, U+1161>
>> (G, G, A), 

>> but this equivalence is neither a canonical equivalence, as it
>> should have been, nor a compatibility equivalence. Still, the latter letter
>> sequence represents EXACTLY the same syllable as the two earlier character
>> sequences, and a proper rendering engine (of which there are already some,
>> I'm told) would correctly render the three sequences in the same way.

>Are you raising this possiblity:    U+1101 <---> U+1100 U+1000  (GG <-> G G) ?
>Design of conjoining (cluster) jamos treat two choseong sequence 
>U+1100 U+1000  as illegal sequence (syllable break condition).

(You mean <U+1100, U+1100>.) No, that is not a syllable break condition. 
See tables 3-4 and 3-5 on page 53 in TUS 3.0 (link below).

(B.t.w. apart from algorithmic decomposition (per se) of Hangul syllable characters,
the cluster Jamos are not needed, and should ideally not be used. That is not
spelled out in TUS 3.0, though...)

> That is described somewhere in Unicode 3, >chapter 3,section 11. That will help you.

Please reread that section carefully! (Link below.) In particular page 53.

>if some rendering engine display the two as the same syllable,
>I suspect the product is buggy or beyond the standard. :-)

No, it's not beyond the Unicode standard at all.  See page 53 of TUS 3.0,
http://www.unicode.org/unicode/uni2book/ch03.pdf which says (my emphasis):
"A standard syllable block is composed of a *sequence* of choseong followed
by a *sequence* of jungseong and optionally a *sequence* of jongseong."

That that description is only about NFD form is not spelled out, nor
is the fact that combining characters, in particular a Hangul tone mark,
may follow (logically they apply to the entire syllable!).  Whether to consider
the combining characters as part of the syllable or not, I think is a matter
of taste. If we also take combining characters into account, but still NFD,
the syntax for a Hangul syllable is:

        Hangul-syllable-NFD  ::=   C+  V+  F*  T*

where C is a choseong, a V is a jungseong (vowel), F is a jongseong, and T is a
combining character (like a Hangul tone mark).  (I'll ignore the FILLER issues
for the moment, including their automatic insertion.)

This is needed to be able to spell historic (and future) Hangul texts that may use
consonant or vowel clusters that are not given a character of their own.

Taking into account what NFC may cause, the full Hangul syllable syntax is:

Hangul-syllable  ::= 
                 C*  CVsyllable  V*  F*  T* 
            |    C*  CVFsyllable  F*  T*  
            |    C+  V+  F*  T*

where CVsyllable is a consonants-vowels syllable character, and CVFsyllable is a
consonants-vowels-consonants syllable character.

>Would you tell me the version of the rendering engine ?

I'm told(!) this is (properly!) implemented at least in Windows XP and
IE 6 (maybe also IE 5.5).  Perhaps also elsewhere (I'm not keeping track).

        Kind regards
        /kent k