[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] call for comments for REORDERING
At 21:27 01/10/19 +0900, Soobok Lee wrote:
>han/hangeul have huge lists of characters (20992/11172).
>To encode them, we need more digits and bits than latin letters.
It may be interesting to give some background information on
Hangul encoding in Unicode. In Unicode 1.0 and 1.1, only a
subset of the 11172 modern hangul syllables, 2000-3000
(taken from KS5601) were encoded, in the place where now
CJK Extension A is located. These represented the subset
of Hangul in actual practical use. However, at the request
of the Korean (ROK) national delegation to ISO/IEC JTC1
SC2/WG2, and after a long and painful discussions, these
codepoints were moved to the current location, and completed
in a regular fashion to the current 11172. A lot of reasons
were given at the time for having to make this change.
It was the only time at all that codepoints were moved,
and it resulted in very strong criticism (very much including
people from the IETF). The main benefit was that it made
the bodies involved understand how important stability is
to the users.
Anyway, if this change hadn't been done, all the frequent
Hangul would now be quite a bit closer together.
Regards, Martin.
>That ACE encoding overhead is so big that it cannot be compensated by the
>superiority in the information capacity of a han letter compared to a
>latin letter. The sizes of Latin alphabets or variants do not exceed 30.
>
>In UCS-2, 2*N octets are need for latin/han lables of length N.
>But ACE/UTF8 which favor latin script, require roughly 3.0*N octets for
>long han labels. REORDERING reduce those requirement into 2.2*N, close to
>that of UCS-2.
>
>Soobok Lee