[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] An ignorant question about TC<-> SC
Hi ! Mr. Patrik Fältström:
The problems of TC/SC in IDN come from the large amount of chinese
SC characters which are derived from TC coexisted in UNICODE table.
1. The SC announced in 7, March,1964 from PRC and re-announced in 24, June,
1986 are all characters in quick-written form which are derived from
original TC and most of them are the same meaning and used only in PRC.
These SC/TC characters are listed in "Total Table of Simplified Characters"
in the announced documents. In Taiwan , we have one variants dictionary
related to Chinese, Japanese, Korean in our web sites of Ministry of
Education .
http://140.111.1.40/fulu/fu5/fu6.htm
CJK characters have about 4000 most-frequently-used characters , a
lot of TC/SC are in these catagory. If we remove some SC that are common
used in Japan(about 123 characters in PRC announced total table ) and all
1-n mapping characters, there still have about 2000 pairs of
PRC-only-SC/TC 1-1 mapping characters. The probability of mixed TC/SC with
the same meanning will be very high. As IDN is the identifier name in DNS ,
the confusing caused by using the same meanning SC to replace the same
meanning TC from user will let chinese IDN in troubles.
2. The SC, TC are co-existed for printing and display , they can not be
unified or one-way mapped (normalized) to one representaive code-point.
Because Japan, Taiwan, Korean use the TC part in their normal-written form.
On the other side, China use the SC part in quick-written form, but HK and
Mo will use TC/SC mixed . TC/SC have different code points , so, it cause
the problems of 2^N registration records and consistency of NS delegation.
3. TC/SC MUST be coded in bi-directional way to support
may-be-recoverable-display and comparison without care 1-1 mapped TC/SC in
LDH-DNS server. It MUST NOT be solved by forced-translating to one of
TC/SC.
L.M.Tseng
----- Original Message -----
From: "Patrik Fältström" <paf@cisco.com>
> --On 2001-10-26 12.48 -0400 ben <ben@cc-www.com> wrote:
>
> Instead of asking the server to do all of this work, one can force the
> client to translate the local charset to one and only one. The server then
> only need to know how to match within that charset. And, the client only
> need to know how to map from the local charset to the unified one.
>
> We then took one more step and asked the clients to also do normalisation,
> so the servers can do matchings based on direct comparison on the bits
> which is the representation of the domain name itself.
>
> paf
>
>