[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] call for comments for REORDERING



At 03:40 01/10/23 +0900, Soobok Lee wrote:

>----- Original Message -----
>From: "Karlsson Kent - keka" <keka@im.se>
> >
> > > I know current NFC/NFKC has *severely flawed* hangul jamo
> >
> > Now that's a new complaint.  At least to me.  Not even the
> > delegations from DPRK complain about that. (And they have
> > all sorts of complaints.)  Do you care to say a little bit
> > more about what you mean?  Or are we just to take your word
> > for it?  There was a problem in earlier drafts of UAX 15
> > concerning Hangul and NFKC (Hangul syllables were not preserved),
> > but that problem was removed before that UTR became a UAX.
>
>two problems:
>
>  1) NFKC maps   compatibility hangul jamo (U+32??) into
>                 conjoining hangul jamo without fillers.

Well, NFKC is just a combination of many kinds of mappings,
and in most cases should be applied piece-by-piece, not
wholesale.


>  2) NFC  does not combine  archaic(old) hangul trailing jamo into the
>      preceding half-combined syllable (from choseong jamo + jungseong jamo),
>       but leave it alone as an isolated conjoining jamo.

Yes. To give a bit more of a background to those not too familiar
with the topic, Hangul Jamo can be categorized along two axes:

Position:
- Initial consonants (or consonant clusters)
- Vowels (or vowel clusters)
- Final consonants (or consonant clusters)

History:
- Used in modern times
- Historical use only

This gives six categories: Im, Ih, Vm, Vh, Fm, Fh

The huge block of 11172 is the result of forming all
the following combinations:

- Im Vm      (for later reference, let's denote that by IV)
- Im Vm Fm   (for later reference, let's denote that by IVF)

NFC (and therefore NFKC, by the fact that composition in NFKC is
the same as in NFC) whenever possible tries to use the precomposed
IV or IVF, rather than the individual Im, Vm, and Fm.

There was a design choice to make for the case where historical
components (Jamos) get involved. In all the following cases:

- Im Vh
- Im Vh Fm
- Im Vh Fh
- Ih Vm
- Ih Vm Fm
- Ih Vm Fh
- Ih Vh
- Ih Vh Fm
- Ih Vh Fh

It is not possible to do any composition, because the combinations
are not available. However, in the case of

- Im Vm Fh

i.e. a modern initial, a modern vowel (cluster), followed by a
historical final, the question was whether this should result in

a) IV Fh
b) Im Vm Fh

The arguments for the first were to precompose as much as possible
and to avoid to have to do lookahead. The arguments for the second
might have been that historical texts would be treated more uniformly
(which doesn't apply because syllables such as Im Vm Fm -> IVF
occur in historical texts and by NFC are definitely recomposed),
or that historical syllables might be treated more uniformly
(a syllable would be either completely decomposed or completely
precomposed). The second argument has some point, but neither
linguistic analysis (for which NFD is best) nor rendering
(which has to deal with lots of combinations anyway) are
severely affected.


>     That means archail hangul syllable which has old hangul trailing jamo
>       cannot be preserved through out NFKC.

First, NFKC isn't an issue here. It's the same for NFC. Second, the actual
syllable is always preserved, whether as IV Fh or whether as Im Vm Fh.
These represent one and the same thing (and that's why we need some kind
of normalization in the first place).


>     NFKC cannot be used for scholarly texts which contains archail hangul
>       syllables.

Again, it's an issue that applies to both NFKC and NFC.
And both can be used for archaic hangul syllables without
any problems.


>This fault was found by Martin, too lately.

It's not a fault, it's by design. And it wasn't found lately, either.
Actually, I remember that it was explicitly discussed at an Unicode
Technical Committee meeting in which I participated by phone.


>Mark and Martin knew and told me why this happened.
>Martin, would you comment on this?

As done above. Hope this helps.


Regards,   Martin.