[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] opting out of SC/TC equivalence

To: =?UTF-8?Q?Patrik_F=C3=A4ltstr=C3=B6m?= <paf@cisco.com>, "Harald Tveit Alvestrand" <harald@alvestrand.no>, <liana.ydisg@juno.com>, "hlqian" <hlqian@cnnic.net.cn>
Subject: Re: [idn] opting out of SC/TC equivalence
From: =?UTF-8?B?dHNlbmdsbUDoqIjntrLkuK3lv4Mu5Lit5aSnLnR3?= <tsenglm@cc.ncu.edu.tw>
Date: Sat, 1 Sep 2001 17:59:50 +0800
Cc: <huangk@alum.sinica.edu>, <idn@ops.ietf.org>
> >           By this principle,  why partial set of CJK characters that are
> > partial setted by local language tag can be used with different TLD
> > (cn,jp,tw..). Because the TLD implied the language tag and will let them
> > differentiable from other TLD .
>
> You have to be more precise.
               We talk the version , updating and how to keep backward
compatible.
And you give a good example:
Let's assume that the table day one include the following characters:
{A,B,C,D,E,F,G,H,c}
We agree that the simple equivalence rules for 1-1 mapping maps A->B.
This means that the following characters are available for domain name
registration:
{B,C,D,E,F}

First , let us assume G,H are reserved  , because they are not frequently
used. and A->B is not the reason of  easy-to-confuse, it is different font
shape but with the same meanning. c->C  because they are easy-to-confuse not
only in meannig but also in similar shape.
Comparing to the above two set, the 1-1 mapping let {A,c} is not allowed in
registration  table. {G ,H} are also not in the registration table  It is
like to let {A,G,H,c} are limited to use for further extension. So ,  you
have further chance to let A included in the furture set . After
A.example.com is allowed in registration table and you can also let it to be
assigned as B.example.com  for backward compatible first .  Then you can let
"A" available to other .
         But the {c,C} is in the set of easy-to-confuse , so {c} is fixed
mapping to {C}.
(BC.tw , Bc.cn) (AD.tw , BD.cn)  are all workable and may  not be confused
in {A,B}{C,c} . But  (BC.com  Bc.com ) (AD.com BD.com) will be confused in
the same domain .
         If you must let {A,G,H,c} in .com ,  We suggest to divided them
into 3 parts ,  {c,C} in nameprep and {A,B} in language related keyword
system , {G,H} reserved. By letting the character set size small in
frequently using, the problems can be fixed easily in 1-1 mapping.

>
> (A) In DNS, we can only use one character set. One only. We have picked
> Unicode.
>
> (B) In DNS, the matching algorithm have to be the same in every piece of
> software which uses the DNS. It can NOT differ between different languages
> used by the client. It can NOT differ between different domains. It can
NOT
> differ between geographical regions. The matching algorithm we have is
> nameprep.
>
> (C) A TLD CAN have a policy which says that only a subset of the
characters
> allowed according to (A) is allowed.
>
>
> You don't specify when you say "TLD" and "language tag" above whether you
> imply a restriction according to (C) (which is ok) or if you have a
> requirement which is a vilation to (B), or if you have a violation to (A).
>
> Be more specific.
>
        The (C) is policy based , so how do you restrict .COM   not  to
produce more confusing ?   In a ccTLD like cn,   japan language domain name
will not be registed in their domain,
so , the confusing can be restricted to easy-to-confuse characters only,
not need to concern the sementic reated characters.
> The proposals I have seen maybe look nice from a functional standpoint,
but
> they all have completely ignored the technical constraints the DNS system
> has. If some of you which oppse the use of one character set and nameprep
> came up with a solution which makes sense technically, we could talk.
>
          The solution of  character mapping in nameprep  will assign a
standard characters in the equivalent set , so  CJK area will not happy in
this situation. But if  you let them find later that  confusing in
characters mixing in gTLD will happen then they will be more angry.
Realname's keyword is fine in language separation but it also need to keep
the uniqueness in name.

> In this mail, you once again show that you completely have missed the
point
> of the comments from the other members of this working group.
>
        Sorry,  I do not against your comments but I am not so insist to
suggest my draft , if  there are better solution existed I will accept them.
I do not like to hear someone hate we use a 8bit clean  BIND compatible DNS
server that can do multibyte characters lookup and  TC/SC equivalence
comparations. MicroSoft has showed all the possibility ,  I think there are
a few differences .

> You have to differ between comments on flaws in the technical solution
> proposed and flaws in the functional specification you present.
>
> >  The problems com from these contraints:
> >           1. Big table of UNICODE let domain name can be mixing registed
> > but viewed in part now.
>
> Yes.
>
> >           2. Many duplicate, easy-to-confuse and
> > un-normalied/non-canonicalied scripts in UNICODE.
>
> Correct. And your problem is? There is a reason why not more normalization
> has been done in Unicode. UTC and ISO has been working for years on this
> problem. Why do you think that IETF is better on doing the work than ISO
> and UTC?
>
          I  do not expect it .  We know the constraints in our language
scripts , so we  suggest the approach to avoid the big trouble in TC/SC and
hope it can be improved step by step. I also hope this approach will help
other language.
> >           3. No one can change the coding at one night , you can only
> > transit step by step.
>
> That is why we propose the ACE encoding.
>
> >           4. ML.com try to use all code point withou considering the
> > troubles come from mixing.
>
> If you use all code points without mixing them, what is the problem? The
> problem comes when you try to mix characters from different scripts. The
> problem comes when you try to normalize between characters in different
> scripts. This is why UTC and ISO has not done more normalization. It is
> also explained in the Unicode Standard 3.0 if I don't remember wrong. Yes,
> on page 961 and 962 in the Unicode Standard 3.0.
>
> What you and Liana are doing is claiming that the work the last 20 years,
> the cooperation between all countries which use CJK characters in ISO is
> wrong, and that IETF should do the job instead.
>
        It is impossible to IETF.  The TC/SC problems come from PRC and
Taiwan in internal civial war, the mixing of  TC/SC will happen in HK, Mo.
Even now , TC/SC is not a good topic in Taiwan.

> I think you should be VERY careful in your statements when you say that
the
> work in ISO has flaws.
>
         Actually, in CJK area only library people like to use ISO 10646
code set , at least it contains all scripts in all old books. You can find
the true and ask why they all use native code set in other area.
> This discussion has for me been a completely vaste of time, and I will
> really think twice before I respond to another message in this thread.
>
> I will not respond if not _constructive_ proposals which do work given the
> technical constraints we have in DNS.
>
      For your patients,  I will  ask Prof. Ho to  ajust our draft and try
to merge it with your,  if it is possible .

L.M.Tseng
Prev by Date: [idn] ICANN's survey report on IDN
Next by Date: Re: [idn] opting out of SC/TC equivalence
Prev by thread: Re: [idn] opting out of SC/TC equivalence
Next by thread: Re: [idn] opting out of SC/TC equivalence
Index(es):
- Date
- Thread