[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] opting out of SC/TC equivalence
Dear Co-chair James :
I only point out PRC simplified quick-written (草書) characters,
these characters are exactly the same as traditional formal-written (楷書)
characters , but they have different scripts font , so it is different in
UNICODE. These characters can be quickly recognized by the small difference
in components (糸), (頁) . In Taiwan , if we writte them quickly, it is
the form of SC(统) SC(频) , but big5 character set is similiar to Japanese
Kanji , it is the normal-written traditional scripts form . So , Japan,
Korean , Taiwan has not use these quick-written SC scripts .
If you use windows outlook mail and you trust its correctness, you
can verified it by change the format code type to any language: SJIS ,
koren, big5, gb2312(that is GBK). If it is not one of the character set ,
it will be displayed as an blank mark in SC(统) SC(频) . So, people of
Japan , Korean can not get feelings of why some (not all) TC/SC equivalence
is needed .
1.The major assumption is based on a well normalized UNICODE , but
it is not true in CJK area, because people are use different code system(
gb, big5, SJIS, UNC-KR).
2. You can not expect all people to change to UNICODE at one
night.
3. Global TLD , like ML.COM is pre-mature , ML.jp Ml.tw, ML.cn
can be work in their own way, but ML.com based on unicode mixing them
together then produce confusing status.
4. IE had help VeriSign/NSI to resolve ML.com , the external
auto.search.msn.com server and laguage-tagged HTTP parameters can solve
these complexity easily. In my feeling, the server auto.search.msn.com will
be a special super ROOT of name server in the furture . The situation is
happen .
We discuss many things in this WG for the reasons that we all
support DNS, but if the conflicts can not be solved and more hurts are
happened again over again in the proposed IDN system .
I can understand the difficulties of IDN WG, but the major
problems are based on some false assumptions , and so many nonsense
forbidden concepts are preassigned, these will let DNS become too old to
act as a furture name system.
L.M.Tseng
> > I think you can display these chinese characters in your
> > system, so you can make
> > the explaination , and tell me the answer how to treat them ?
> > TC(統) , SC(统)
> > TC(頻), SC(频)
>
> These example stated at first level simplication of chinese by radical.
> They are equivalent in most context of Chinese language so I think we
> can both agree on this. And yes, it is not handled in current
> normalization or nameprep.
>
> How can we solve this? Many ways, each one with its pros and cons. I
> will provide some suggestions but I am sure there are other ways:
>
> 1. Do it inside Normalization Form KC (Standard Track)
>
> Speak to the Unicode Consortium, convience them that these two ideograph
> are equivalent and put into NFKC. This will go directly into Nameprep so
> long Unicode Consortium agree with it since Nameprep just uses the code
> points from Unicode Consortium.
>
> The people at Unicode Consortium would probably question if these
> ideograph are equivalent in Kanji and Hanja or olden Vietnamese so we
> need to prepare for that.
>
> Pros: it would be part of Nameprep standard. And if NFKC accept this, it
> would also solve in other I18N efforts in future, and not just IDN.
>
> Cons: we need to go thru the review process in Unicode Consortium.
>
> 2. Do this "optional folding" pre-Nameprep (Informational based)
>
> We would define these mappings within IETF, but published it as
> Informational based as an optional folding for Chinese system only.
>
> Pros: We do this within IETF with probably assistant from other group
> for review. It also open Nameprep for localized foldings depending on
> other set.
>
> Cons: It may be difficult to determine what optional folding rules
> should apply for a name. A Japanese (or Cyrillic) names could be entered
> using GBK for example and which rules do we apply? And who has priority
> to decide what folding mechanism? The registrant of the name or the user
> of the name? Is 中国.com a Chinese domain name "zhongguo.com" referring to
> China.com or is it a Japanese domain name "chugoku.com" referring to
> another place in Japan?
>
> 3. Do this in the zonefile (Best Current Practice?)
>
> We would define these mappings in the zonefile for DNS and hence
> irregardless how the user type it in, they will end up with the same
> resource records.
>
> Pros: It is an opertional issues for Chinese domain names. Registrant of
> names would controll what is equivalent and what is not and that may be
> defined as a policy on a per-zone basis.
>
> Cons: There would be multiples entries in the zonefiles but they can be
> solve by software implementation to generate these entries on loading.
>
> Therefore, there are many solutions to the TC/SC problem. Which path to
> take would depend on the tsconv author decision and the wg consensus. No
> solution is perfect and it is all engineering trade-off.
>
> Speaking for myself, I would love to see this get done in (1) because it
> means it will solve it for other protocol, not just domain names in
> future. But I am not sure how to address Unicode Consortium concern. I
> am strongly against (2) approach because it will solve the problem by
> creating other problems. Implementation experience have been proven to
> be very headache to maintain and 'guess' optional foldings to be
> applied. I believe (3) is a reasonable approach altho not a perfect
> solution either.
>
> > They are the same chinese characters in pairs but they are coded with
> > different UNICODE .
> > Does they are like the problems of " fi " ?
> > And tell me why a A should be mapped to ASCII "a" or "A" ?
>
> Problems like "fi" and "a" vs "A" are handled in Nameprep not because
> IETF decided so, but rather the code points from Unicode Consortium have
> these mappings/normalization.
>
> IETF is not in the business to define codepoint because we are not
> script or language expert. We leave it to other groups who have more
> expertise and we reference their work. Thus, this question is most
> appropriate ask to the Unicode Consortium, and not in this WG.
>
> > I don't expect this WG to solve all the equivalence of TC/SC. I just
> want
> > to know what is the guideline to reduce the confusing troubles in
> nameprep ?
> > Why so amall set of PRC simplified quick-written scripts are not case
> > folding problem ?
>
> God knows I agree with you. :-)
>
> But this is a question which this WG have no answer for since it
> references it code points from other place.
>
> -James Seng
>
>
>