[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] opting out of SC/TC equivalence

To: =?utf-8?B?dHNlbmdsbUDoqIjntrLkuK3lv4Mu5Lit5aSnLnR3?= <tsenglm@cc.ncu.edu.tw>,"Kenny Huang" <huangk@alum.sinica.edu>, <liana.ydisg@juno.com>
Subject: Re: [idn] opting out of SC/TC equivalence
From: "James Seng/Personal" <James@Seng.cc>
Date: Fri, 17 Aug 2001 17:05:34 +0800
Cc: <idn@ops.ietf.org>, <dhc@dcrocker.net>

Dear Prof Tseng,

> Hi James,
>              I  think you can display these chinese characters in your
> system, so you can make
> the explaination , and  tell me the answer how to treat them ?
> TC(çµ±) , SC(ç»Ÿ)
> TC(é »),  SC(é¢‘)

These example stated at first level simplication of chinese by radical.
They are equivalent in most context of Chinese language so I think we
can both agree on this. And yes, it is not handled in current
normalization or nameprep.

How can we solve this? Many ways, each one with its pros and cons. I
will provide some suggestions but I am sure there are other ways:

1. Do it inside Normalization Form KC (Standard Track)

Speak to the Unicode Consortium, convience them that these two ideograph
are equivalent and put into NFKC. This will go directly into Nameprep so
long Unicode Consortium agree with it since Nameprep just uses the code
points from Unicode Consortium.

The people at Unicode Consortium would probably question if these
ideograph are equivalent in Kanji and Hanja or olden Vietnamese so we
need to prepare for that.

Pros: it would be part of Nameprep standard. And if NFKC accept this, it
would also solve in other I18N efforts in future, and not just IDN.

Cons: we need to go thru the review process in Unicode Consortium.

2. Do this "optional folding" pre-Nameprep (Informational based)

We would define these mappings within IETF, but published it as
Informational based as an optional folding for Chinese system only.

Pros: We do this within IETF with probably assistant from other group
for review. It also open Nameprep for localized foldings depending on
other set.

Cons: It may be difficult to determine what optional folding rules
should apply for a name. A Japanese (or Cyrillic) names could be entered
using GBK for example and which rules do we apply? And who has priority
to decide what folding mechanism? The registrant of the name or the user
of the name? Is ä¸å›½.com a Chinese domain name "zhongguo.com" referring to
China.com or is it a Japanese domain name "chugoku.com" referring to
another place in Japan?

3. Do this in the zonefile (Best Current Practice?)

We would define these mappings in the zonefile for DNS and hence
irregardless how the user type it in, they will end up with the same
resource records.

Pros: It is an opertional issues for Chinese domain names. Registrant of
names would controll what is equivalent and what is not and that may be
defined as a policy on a per-zone basis.

Cons: There would be multiples entries in the zonefiles but they can be
solve by software implementation to generate these entries on loading.

Therefore, there are many solutions to the TC/SC problem. Which path to
take would depend on the tsconv author decision and the wg consensus. No
solution is perfect and it is all engineering trade-off.

Speaking for myself, I would love to see this get done in (1) because it
means it will solve it for other protocol, not just domain names in
future. But I am not sure how to address Unicode Consortium concern. I
am strongly against (2) approach because it will solve the problem by
creating other problems. Implementation experience have been proven to
be very headache to maintain and 'guess' optional foldings to be
applied. I believe (3) is a reasonable approach altho not a perfect
solution either.

> They are  the same chinese characters in pairs but they are coded with
> different  UNICODE .
> Does they are like the problems of   " fi " ?
> And  tell me  why  ï½ã€€ï¼¡ã€€should  be mapped to ASCII  "a" or "A" ?

Problems like "fi" and "a" vs "A" are handled in Nameprep not because
IETF decided so, but rather the code points from Unicode Consortium have
these mappings/normalization.

IETF is not in the business to define codepoint because we are not
script or language expert. We leave it to other groups who have more
expertise and we reference their work. Thus, this question is most
appropriate ask to the Unicode Consortium, and not in this WG.

> I don't expect  this WG to solve all the equivalence of TC/SC. I just
want
> to know what is the guideline to reduce the confusing troubles in
nameprep ?
> Why so amall set of  PRC simplified quick-written scripts are not case
> folding problem ?

God knows I agree with you. :-)

But this is a question which this WG have no answer for since it
references it code points from other place.

-James Seng

Prev by Date: Re: [idn] Question for the Kanji & Hanja cognosentee
Next by Date: Re: [idn] Question for the Kanji & Hanja cognosentee
Prev by thread: Re: [idn] opting out of SC/TC equivalence
Next by thread: Re: [idn] opting out of SC/TC equivalence
Index(es):
- Date
- Thread