Re: [idn] Zone rules (was: wg milestones update)

----- Original Message -----

From: Maynard Kang <maynard@pobox.org.sg>

To: <sun@cnnic.net.cn>; Eric Brunner-Williams in Portland Maine <brunner@nic-naa.net>

Sent: Monday, April 30, 2001 7:54 PM

Subject: Re: [idn] Zone rules (was: wg milestones update)

>Hi Guonian,

>> for chinese user, the TC-SC equivalence rule is just alike the
case-folding rule.
>> I'd like to show an example followed using the case-folding rule.
>>
>> under zone .COM, if manager defines upper case characters equal to lower
case
>> characters. users could access IBM' domain with ibm.com, IBm.com, ... in
any
>> case.
...
>> So I think the TC-SC equivalence rule SHOULD be consistent anywhere .

>Since we're on this topic of TC-SC equivalence yet again, I'd like to point
>out two issues that I think we should all consider before we go further, if
>these have not been reiterated enough in previous postings made by more
>qualified individuals than myself in the past:

>1) The ruleset for TC-SC equivalence is not a 1-n or n-1 mapping of
>abstract characters, as pointed out by Harald Alvestrand. It can
>be hideously complex (see draft-ietf-idn-cjk-01.txt for details), with
>numerous lexical and contextual considerations. I am sure that you would
>know about the "头发" = "頭髮" but "发财" != "髮財" problem.

>2) Case-folding is a simple canonical process, and the folding rules are
>the same, I believe (someone please correct me if I'm wrong), for most
>scripts which are able to be represented in ASCII (i.e. English, Swahili,
>Hawaiian,
>Malay, etc).

>For example, there is no debate as to whether "CAT" or "cat" or "cAt"
>refers to the same thing (in English), or whether "KUCHING" or "kuching" or
>"kUchiNG" refers to the same thing ("cat" in Malay).

There is also no debate as to whether "日產" = "日产" in Chinese refers to the same thing.

>However, Han character canonicalization does not follow the same rules. Han
>characters used in Chinese, Japanese and Korean have different equivalence
>rules. This is also pointed out in draft-ietf-idn-cjk-01.txt.

>For example, "日產" = "日产" in Chinese, but "日產" != "日产" in Japanese.
>In both cases, the exact same code points in Unicode are used. This has
>been pointed out too, that since Unicode does not recognize the difference
>between languages, but only the difference between scripts. Hence, it will
>be difficult for any Unicode-based system to function on language-based
>equivalence rules.

"日產" = "日产" in Chinese and "日產" != "日产" in Japanese. If we don't talk about Chinese character mappiing problem, and we begin talk about UNICODE character mapping problem,the code point "產" -- "产" is 1-n problem and it's not 1-1 mapping.

>I do agree with you that the problem that you have pointed out is indeed a
>real problem and there is a need for TC-SC equivalence for locating
>resources on the Internet.

>However, it is a problem that cannot be solved by a Unicode-based DNS
>solution within the parameters of the IDN WG and hence this may not be the
>right place to address these issues.

>maynard