[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Zone rules (was: wg milestones update)
----- Original Message -----
Sent: Monday, April 30, 2001 7:54 PM
Subject: Re: [idn] Zone rules (was: wg milestones
update)
Mr. maynard:
Thanks for your straigh-out. On
the scientific road there is no different just the problem which is waiting
for be solved.
>Hi Guonian,
>> for chinese user, the
TC-SC equivalence rule is just alike the
case-folding rule.
>> I'd
like to show an example followed using the case-folding
rule.
>>
>> under zone .COM, if manager defines upper case
characters equal to lower
case
>> characters. users could access
IBM' domain with ibm.com, IBm.com, ... in
any
>>
case.
...
>> So I think the TC-SC equivalence rule SHOULD be
consistent anywhere .
>Since we're on this topic of TC-SC equivalence
yet again, I'd like to point
>out two issues that I think we should all
consider before we go further, if
>these have not been reiterated enough
in previous postings made by more
>qualified individuals than myself in
the past:
>1) The ruleset for TC-SC equivalence is not a 1-n or n-1
mapping of
>abstract characters, as pointed out by Harald Alvestrand. It
can
>be hideously complex (see draft-ietf-idn-cjk-01.txt for details),
with
>numerous lexical and contextual considerations. I am sure that you
would
>know about the "头发" = "頭髮" but "发财" != "髮財"
problem.
TC-SC equivalence is NOT lexical and contextual
problem.it's code mapping.
1-n or n-1 mapping is minority,the 1-1 mapping is
majority.About the two part,we can use different method to
solve.
>2) Case-folding is a simple canonical process,
and the folding rules are
>the same, I believe (someone please correct me
if I'm wrong), for most
>scripts which are able to be represented in ASCII
(i.e. English, Swahili,
>Hawaiian,
>Malay, etc).
Definitely,ASCII just include 26 letter and is easy to
deal with. But it is in DNS.And now,we are discuss IDN. If IDN do not to decide
to solve multiligual problem,...
Sure,the TC-SC mapping is difficult.If we give up
just because difficult,it's not scientific
attitude.
>For example, there is no debate as to whether
"CAT" or "cat" or "cAt"
>refers to the same thing (in English), or whether
"KUCHING" or "kuching" or
>"kUchiNG" refers to the same thing ("cat" in
Malay).
There is also no debate as to whether
"日產" = "日产" in Chinese refers to the same thing.
>However, Han character canonicalization does not
follow the same rules. Han
>characters used in Chinese, Japanese and
Korean have different equivalence
>rules. This is also pointed out in
draft-ietf-idn-cjk-01.txt.
>For example, "日產" = "日产" in Chinese,
but "日產" != "日产" in Japanese.
>In both cases, the exact same code
points in Unicode are used. This has
>been pointed out too, that since
Unicode does not recognize the difference
>between languages, but only the
difference between scripts. Hence, it will
>be difficult for any
Unicode-based system to function on language-based
>equivalence
rules.
"日產" = "日产" in Chinese and "日產" != "日产" in
Japanese. If we don't talk about Chinese character mappiing problem, and we
begin talk about UNICODE character mapping problem,the code point "產" --
"产" is 1-n problem and it's not 1-1 mapping.
>I do agree with you that the problem that you
have pointed out is indeed a
>real problem and there is a need for TC-SC
equivalence for locating
>resources on the Internet.
>However,
it is a problem that cannot be solved by a Unicode-based DNS
>solution
within the parameters of the IDN WG and hence this may not be the
>right
place to address these issues.
>maynard
Deng Xiang
2001.5.3