[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Comments on IDNA/stringprep/nameprep





> -----Original Message-----
...
> The Unicode consoritium debated making the canonical decomposition
> from <gg> to <g><g> for a long time. The deciding feedback was from
> the Korean national body at the Seoul SC2/WG2 meeting, where they said
> it should not be done; that it was akin to canonically decomposing "w"
> to "vv". 

With all due respect, I don't think that is the correct analysis.  The
original alphabet consisted of 17 consonant and 11 vowels, none of them
"double" per se (later complemented with a few variants, no longer in
use). Using doubled consonants for particular sounds is very much like
using doubled consonants in the Latin script, even though the phonemic
differences are of a different nature.  Yes, the doubled consonants
in Hangul collate as separate entities, but that is a consequence of
the general approach of collating by consonants cluster (and vowels
cluster; irrespective of how that is achieved technically; Mark and
I have had a long debate about the latter...).

If [g][g] and [gg] really are different, how would [g][g] and [gg] be
differentiated within properly typeset Hangul syllable blocks?


> They also objected to combinations like <gs> being
> canonically decomposed, principally so that modern syllables could
> always be decomposed into 3 pieces. 

And why would that be essential?  Especially since some of them still
decompose into two pieces...  Hangul is a very elegant writing system,
but its computer encoding is far from elegant, I'm sorry to say, even
though a subset remains very elegant.  However that subset interacts
with the rest when normalising, and the normalisation is not complete
(different normal forms for the same (logical) sequence of letters).


> The (weaker) compatibility
> decompositions in Unicode until the time that NFC was formed; those
> were removed because they would have prevented the formation of Hangul
> Syllables in NFKC.

No, but them being compatibility instead of canonical decompositions
prevented the *preservation* of Hangul syllable characters in NFKC.

It is not the first time we hear that that would be a syllable
break between two consecutive lead consonants, or between a lead
consonant and a Hangul syllable character, even though that
has never, AFAICT, been the case in Unicode.  It is not unlikely
that the misconception of spurious syllable breaks has lead astray.


		Kind regards
		/kent k