[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Chinese Domain Name Consortium (CDNC) Declaration
L.M. Tseng wrote:
> From owner-idn@ops.ietf.org Tue Feb 5 02:36:26 2002
> To: "Erin Chen" <erin@twnic.net.tw>, "Dave Crocker" <dhc@dcrocker.net>
> Cc: "IESG" <iesg@ietf.org>, "IAB" <iab@isi.edu>,
> "IETF IDN WG" <idn@ops.ietf.org>
> Subject: Re: [idn] Chinese Domain Name Consortium (CDNC) Declaration
> Dear Dave Crocker:
> My friend give me an example about CJK UNICODE , It is
> so ambiguous to me to deifferentiate which one is a correct Chinese
> characters or not ? In our hand writting , each pair are used and mixed
> .
>
> 淸眞敎 U+6DF8 U+771E U+654E
> 淸眞教 U+6DF8 U+771E U+6559
> 淸真敎 U+6DF8 U+771F U+654E
> 淸真教 U+6DF8 U+771F U+6559
> 清眞敎 U+6E05 U+771E U+654E
> 清眞教 U+6E05 U+771E U+6559
> 清真敎 U+6E05 U+771F U+654E
> 清真教 U+6E05 U+771F U+6559
Huh? How is this contributing to closure on Last Call on
the IDNA documents? And why is it cc'd to IESG and IAB?
For those who may be mystified, this is the Chinese word for
"Islam", qing1zhen1jiao4.
The ordinary way this would appear in a PRC dictionary is:
U+6E05 U+771F U+6559
and not any of the other 7 permutations.
In a more traditional dictionary as might be seen in Taiwan
or Hong Kong, it might be printed:
U+6DF8 U+771E U+6559
and not any of the other 7 permutations.
However, if you were using a Big-5 computer in Taiwan,
you would use the same characters as for the PRC for
this:
U+6E05 U+771F U+6559
and not any of the other 7 permutations. (though the
fonts might vary in which glyph they show, in any case)
U+6E05 and U+771F, by the way, are examples of "traditional
simplifications" reflecting handwritten forms, that
predate the PRC systematic simplifications. The same two
forms are also used in Japan.
U+654E is another handwriting alternative for U+6559, but
it is seldom seen in printed material. U+654E is used in
the PRC, Taiwan, and in Japan alike.
All 6 characters have G, T, and K sources in 10646, and
4 of them have J sources as well. So for this kind of
overlap of forms, any suggestion to delete G-source-only
characters from the allowed set does nothing at all.
And lest this example be taken on its face value
as indicating a problem in "CJK UNICODE", it should be noted
that the presence of these alternate forms of the "same character"
in Unicode is due to the same distinctions being made in
legacy CJK character encodings in Asia. In particular,
note the following mappings:
For "GBK", Code Page 936 Simplified Chinese:
0x9C5B 0x6DF8 #CJK UNIFIED IDEOGRAPH
0xC7E5 0x6E05 #CJK UNIFIED IDEOGRAPH
0xB177 0x771E #CJK UNIFIED IDEOGRAPH
0xD5E6 0x771F #CJK UNIFIED IDEOGRAPH
0x949C 0x654E #CJK UNIFIED IDEOGRAPH
0xBDCC 0x6559 #CJK UNIFIED IDEOGRAPH
And for "Shift-JIS", Code Page 932 Japanese:
0xEDE4 0x6DF8 #CJK UNIFIED IDEOGRAPH
0xFB43 0x6DF8 #CJK UNIFIED IDEOGRAPH
0x90B4 0x6E05 #CJK UNIFIED IDEOGRAPH
0xE1C1 0x771E #CJK UNIFIED IDEOGRAPH
0x905E 0x771F #CJK UNIFIED IDEOGRAPH
0xEDB1 0x654E #CJK UNIFIED IDEOGRAPH
0xFACD 0x654E #CJK UNIFIED IDEOGRAPH
0x8BB3 0x6559 #CJK UNIFIED IDEOGRAPH
So if you are working on a Windows system in either of
these legacy code pages, in China or Japan, you
already have the same options for representational
ambiguity, without invoking Unicode at all.
--Ken
>
> L.M.Tseng
>