[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Zone rules (was: wg milestones update)





----- Original Message -----
From: Maynard Kang <maynard@pobox.org.sg>
To: <sun@cnnic.net.cn>; Eric Brunner-Williams in Portland Maine <brunner@nic-naa.net>
Cc: <idn@ops.ietf.org>
Sent: Monday, April 30, 2001 7:54 PM
Subject: Re: [idn] Zone rules (was: wg milestones update)
 
Mr. maynard:
Thanks for your straigh-out. On the scientific road there is no different just the problem which is waiting for be solved.
>Hi Guonian,

>> for chinese user, the TC-SC equivalence rule is just alike the
case-folding rule.
>> I'd like to show an example followed using the case-folding rule.
>>
>> under zone .COM, if manager defines upper case characters equal to lower
case
>> characters. users could access IBM' domain with ibm.com, IBm.com, ... in
any
>> case.
...
>> So I think the TC-SC equivalence rule SHOULD be consistent anywhere .

>Since we're on this topic of TC-SC equivalence yet again, I'd like to point
>out two issues that I think we should all consider before we go further, if
>these have not been reiterated enough in previous postings made by more
>qualified individuals than myself in the past:

>1) The ruleset for TC-SC equivalence is not a 1-n or n-1 mapping of
>abstract characters, as pointed out by Harald Alvestrand. It can
>be hideously complex (see draft-ietf-idn-cjk-01.txt for details), with
>numerous lexical and contextual considerations. I am sure that you would
>know about the "头发" = "頭髮" but "发财" != "髮財" problem.
TC-SC equivalence is NOT lexical and contextual problem.it's code mapping.
1-n or n-1 mapping is minority,the 1-1 mapping is majority.About the two part,we can use     different method to solve.

>2) Case-folding is a simple canonical process, and the folding rules are
>the same, I believe (someone please correct me if I'm wrong), for most
>scripts which are able to be represented in ASCII (i.e. English, Swahili,
>Hawaiian,
>Malay, etc).
Definitely,ASCII just include 26 letter and is easy to deal with. But it is in DNS.And now,we are discuss IDN. If IDN do not to decide to solve multiligual problem,...
 
Sure,the TC-SC mapping is difficult.If we give up just because difficult,it's not scientific attitude.
 
>For example, there is no debate as to whether "CAT" or "cat" or "cAt"
>refers to the same thing (in English), or whether "KUCHING" or "kuching" or
>"kUchiNG" refers to the same thing ("cat" in Malay).

There is also no debate as to whether "日產" = "日产"  in Chinese refers to the same thing.
 
>However, Han character canonicalization does not follow the same rules. Han
>characters used in Chinese, Japanese and Korean have different equivalence
>rules. This is also pointed out in draft-ietf-idn-cjk-01.txt.

>For example, "日產" = "日产"  in Chinese, but "日產" != "日产"  in Japanese.
>In both cases, the exact same code points in Unicode are used. This has
>been pointed out too, that since Unicode does not recognize the difference
>between languages, but only the difference between scripts. Hence, it will
>be difficult for any Unicode-based system to function on language-based
>equivalence rules.
"日產" = "日产"  in Chinese and "日產" != "日产"  in Japanese. If we don't talk about Chinese character mappiing problem, and we begin talk about UNICODE character mapping problem,the code point "產" -- "产" is 1-n problem and it's not 1-1 mapping.

>I do agree with you that the problem that you have pointed out is indeed a
>real problem and there is a need for TC-SC equivalence for locating
>resources on the Internet.

>However, it is a problem that cannot be solved by a Unicode-based DNS
>solution within the parameters of the IDN WG and hence this may not be the
>right place to address these issues.

>maynard
Deng Xiang
2001.5.3