[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] call for comments for REORDERING




----- Original Message ----- 
From: "Martin Duerst" <duerst@w3.org>
To: "Soobok Lee" <lsb@postel.co.kr>; <DougEwell2@cs.com>; <idn@ops.ietf.org>
Cc: <jseng@pobox.org.sg>
Sent: Monday, October 22, 2001 12:23 PM
Subject: Re: [idn] call for comments for REORDERING


> At 21:27 01/10/19 +0900, Soobok Lee wrote:
> 
> >han/hangeul have huge lists of characters (20992/11172).
> >To encode them, we need more digits and bits than latin letters.
> 
> It may be interesting to give some background information on
> Hangul encoding in Unicode. In Unicode 1.0 and 1.1, only a
> subset of the 11172 modern hangul syllables, 2000-3000
> (taken from KS5601) were encoded, in the place where now
> CJK Extension A is located. These represented the subset
> of Hangul in actual practical use. However, at the request
> of the Korean (ROK) national delegation to ISO/IEC JTC1
> SC2/WG2, and after a long and painful discussions, these
> codepoints were moved to the current location, and completed
> in a regular fashion to the current 11172. A lot of reasons
> were given at the time for having to make this change.
> It was the only time at all that codepoints were moved,
> and it resulted in very strong criticism (very much including
> people from the IETF). The main benefit was that it made
> the bodies involved understand how important stability is
> to the users.

Thanks for detailed history of Hangul in unicode.
But including 11172 hangul made Unicode more useful and 
widely acceptable to korean community, because existing 
KS5601's 2350 or more hangul has been criticized for not being
able to serve many scholarly purpose or book publishings.

The pain for inclufing 11172 hangul  was REWARDED!

I know current NFC/NFKC has *severely flawed* hangul jamo handling behaviors
that was put into UTR15 because of lack of careful reviews by both
Koreans represenatives and UTC members.
This pain was not from including 11172, but from careless reviewers.

I hope UTC cures this BUGGY NFC/NFKC jamo handling as an exceptional case
while it is in  strict adherence to UTR's backward compatiblity policy & promise.

Soobok Lee

> 
> Anyway, if this change hadn't been done, all the frequent
> Hangul would now be quite a bit closer together.
> 
> Regards,   Martin.
> 
> 
> >That ACE encoding overhead is so big that it cannot be compensated by the 
> >superiority in the information capacity of a han letter compared to a 
> >latin letter. The sizes of Latin alphabets or variants do not exceed 30.
> >
> >In UCS-2, 2*N octets are need for latin/han lables of length N.
> >But ACE/UTF8 which favor latin script, require roughly 3.0*N octets for 
> >long han labels. REORDERING reduce those requirement into 2.2*N, close to 
> >that of UCS-2.
> >
> >Soobok Lee
> 
>