[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [idn] stringprep comment 5: hangul conjoining sequence
> -----Original Message-----
> From: Soobok Lee
...
> 1. When old trailing hangul jamos are included in
> conjoining jamo sequences, UAX15(NFC)
> performs partial combinations to produce "a modern
> hangul syllable(LV) + a standalone
> old hangul jamo(oT)" and that form satify a ridiculous
> syllable break condition (X.T) .
There is no syllable break there. There may be a sequence of L before
a Hangul syllable character, and a sequence of T after it, without any syllable
break inbetween. For some of the Hangul syllable characters, there may even
be both Vs and Ts after, without there being a syllable break.
> 2. UAX15 also tries ridiculous partial combinations when it
> met a combining sequence if
> two or more leading hangul jamos followed by hangul
> vowel jamos, and produce
> a syllable break condition (L.LV)
There is no syllable break there. [Skipping syllable characters for the moment]
Note that a Hangul syllable consists of a non-empty **sequence** of L, followed
by a non-empty **sequence** of V, followed by a (possibly empty) **sequence** of T.
Note that in many cases compatibility equivalents with regard to these were
(erroneously) made non-equivalent between Unicode 2.1 and Unicode 3.0.
> 3. Compatibility hangul jamos are mapped into conjoining
> jamos without any fillers.
Compatibility (non-conjoining) Hangul letters are best prohibited. Doing
the correct mapping is not expressible in via nameprep without adding
a new, special for Hangul, mechanism. Would there be any major problems
just prohibiting them? (Allowing only the conjoining jamo and syllable characters.)
...
> Now I propose UTC make new normalizations (call it NFN)to
> correct such errors and faults and
> let stringprep include it after casefolding : that is
> NFKC(NFN(casefold(x)).
I have suggested a solution that involves only additions to the tables in
"nameprep" (these include prohibiting non-conjoining compatibility Jamo
as well as the Hangul filler characters). Table available upon request.
I agree that it is very unfortunate that "letter sequence" equivalent strings,
like [gg] (SSANGKIYEOK) and [g][g] (<KIYEOK, KIYEOK>) are not
formally equivalent in Unicode; indeed these should have been canonically
equivalent; but the normal forms are by now frozen, and I don't think anyone
wants to have yet another Unicode formal equivalence or normal form.
Kind regars
/kent k