[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] stringprep comment 5: hangul conjoining sequence

To: "'Soobok Lee'" <lsb@postel.co.kr>, <idn@ops.ietf.org>
Subject: RE: [idn] stringprep comment 5: hangul conjoining sequence
From: "Kent Karlsson" <kentk@md.chalmers.se>
Date: Mon, 11 Feb 2002 18:32:15 +0100
Cc: "'Martin Duerst'" <duerst@w3.org>, "'Mark Davis'" <mark.davis@macchiato.com>
In-reply-to: <0c0601c1b316$dfc16350$2b19fea9@temp>


> -----Original Message-----
> From: Soobok Lee
...
>  1. When  old trailing hangul jamos are included in 
> conjoining jamo sequences, UAX15(NFC)
>      performs  partial combinations to produce  "a modern 
> hangul syllable(LV) + a standalone
>       old hangul jamo(oT)" and that form satify a ridiculous 
> syllable break condition (X.T) .

There is no syllable break there.  There may be a sequence of L before
a Hangul syllable character, and a sequence of T after it, without any syllable
break inbetween.  For some of the Hangul syllable characters, there may even
be both Vs and Ts after, without there being a syllable break.

>  2. UAX15 also tries ridiculous partial combinations when it 
> met a combining sequence if
>       two or more leading hangul jamos followed by hangul 
> vowel jamos, and produce
>       a syllable break condition (L.LV)

There is no syllable break there.  [Skipping syllable characters for the moment]
Note that a Hangul syllable consists of a non-empty **sequence** of L, followed
by a non-empty **sequence** of V, followed by a (possibly empty) **sequence** of T.
Note that in many cases compatibility equivalents with regard to these were
(erroneously) made non-equivalent between Unicode 2.1 and Unicode 3.0. 

>  3. Compatibility hangul jamos are mapped into conjoining 
> jamos without any fillers.

Compatibility (non-conjoining) Hangul letters are best prohibited.  Doing
the correct mapping is not expressible in via nameprep without adding
a new, special for Hangul, mechanism.  Would there be any major problems
just prohibiting them? (Allowing only the conjoining jamo and syllable characters.)

...
>  Now I propose UTC make new normalizations (call it NFN)to 
> correct such errors and faults and
>     let stringprep include it after casefolding : that is  
> NFKC(NFN(casefold(x)).

I have suggested a solution that involves only additions to the tables in
"nameprep" (these include prohibiting non-conjoining compatibility Jamo
as well as the Hangul filler characters).  Table available upon request.

I agree that it is very unfortunate that "letter sequence" equivalent strings,
like [gg] (SSANGKIYEOK) and [g][g] (<KIYEOK, KIYEOK>) are not
formally equivalent in Unicode; indeed these should have been canonically
equivalent; but the normal forms are by now frozen, and I don't think anyone
wants to have yet another Unicode formal equivalence or normal form.

		Kind regars
		/kent k

Follow-Ups:
- Re: [idn] stringprep comment 5: hangul conjoining sequence
  - From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>
- Re: [idn] stringprep comment 5: hangul conjoining sequence
  - From: "Soobok Lee" <lsb@postel.co.kr>

References:
- [idn] stringprep comment 5: hangul conjoining sequence
  - From: "Soobok Lee" <lsb@postel.co.kr>

Prev by Date: [idn] stringprep comment 6: casefold and then noramlization is not enough
Next by Date: Re: Inputting mixed SC/TC (Re: [idn] A question...)
Previous by thread: [idn] Re: stringprep comment 5: hangul conjoining sequence
Next by thread: Re: [idn] stringprep comment 5: hangul conjoining sequence
Index(es):
- Date
- Thread