Please distinguish NFKC and NFC. They are not the same, and have
different properties. In several places below you have confused them,
and how they operate.
For your first problem it just shows that NFKD and NFKC should be
used with caution. They cannot be applied blindly, especially not
to text in general. However, I don't see that this problem
(which is quite apparent to anyone reading UAX 15) hinders the use
of NFKC for for the particular application of IDNs. Your first
problem does not affect NFD or NFC at all.
Your second problem does affect NFC (and thus also NFKC), but does
not affect NFD (fully decomposed). This problem could, in principle,
have been avoided if there had been no Hangul syllable characters...
So for historic texts (using historic Jamo, or historic (or uncommon)
consonant (or vowel) clusters of otherwise modern Jamo) NFD would probably
be the cleanest approach, especially for rendering such substrings. As
I have read this, it just means that the conjoining Jamo can interact
with the Hangul syllables too (I'm not sure if they actually do in
any (rendering) implementation, nor if that was intended). However,
again I don't see that this would be a problem for the particular
application of IDNs, though I do see a problem for rendering engines.
(Sorting, by the way, when Jamo are supported, should support Hangul
syllables through canonical decomposition.)
The entire intent, IFAIK, with the Hangul syllables was as an
expedient for implementations that do not support Hangul Jamo at all.
For those that do support Hangul Jamo, Hangul syllable characters are
more of a pain than a help, and should decompose them for rendering
and sorting (at least), but need not do so for storage, or for IDNs...
Sorry for not following the IDN debate for a while, but I kind of
gave up when the focus turned solely onto "ACE"s, making the entire
exersice into something that may likely result in something that I
don't wish to see deployed, having very negative experiences with
"Quoted-Printable". And now there is suggestion for 'reordering' too!
This does not look good. We don't need to have convoluted
'solutions' that are applicable only for IDNs.
Kind regards
/kent k
> -----Original Message-----
> From: Soobok Lee [mailto:lsb@postel.co.kr]
...
> two problems:
>
> 1) NFKC maps compatibility hangul jamo (U+32??) into
> conjoining hangul jamo without fillers.
> That converts compat hangul jamo sequences into conjoining
> jamo sequences and they are combined partially in some cases
> by the NFC following compatibilty decomposition.
> Modern hangul informal texts contains compatibiilty jamos
> very often.
> If the texts contains choseong + jungseong sequence in
> compat jamo,
> they will be affected by this fault.
> This fault was found by Soobok Lee, too lately.
>
> 2) NFC does not combine archaic(old) hangul trailing jamo into the
> preceding half-combined syllable (from choseong jamo +
> jungseong jamo),
> but leave it alone as an isolated conjoining jamo.
> That means archail hangul syllable which has old hangul
> trailing jamo
> cannot be preserved through out NFKC.
> NFKC cannot be used for scholarly texts which contains
> archail hangul
> syllables. This fault was found by Martin, too lately.
>
> Mark and Martin knew and told me why this happened.
> Martin, would you comment on this?
>
> Soobok Lee
>
>
> >
> > > I hope UTC cures this BUGGY NFC/NFKC jamo handling as an
> > > exceptional case while it is in strict adherence to
> UTR's backward
> > > compatiblity policy & promise.
> >
> > The normal forms for strings of already allocated characters
> > will not be changed any more.
> >
> > Kind regards
> > /Kent K
> >
>