[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration
- To: "Elisabeth Porteneuve" <Elisabeth.Porteneuve@cetp.ipsl.fr>, "Dave Crocker" <dhc@dcrocker.net>
- Subject: Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration
- From: "Soobok Lee" <lsb@postel.co.kr>
- Date: Thu, 7 Feb 2002 02:11:02 +0900
- Cc: <deng@cnnic.net.cn>, <erin@twnic.net.tw>, <mclaughlin@pobox.com>, <Elisabeth.Porteneuve@cetp.ipsl.fr>, <Marc.Blanchet@viagenie.qc.ca>, <ajm@icann.org>, <alanysho@hkdnr.net.hk>, <christine.tsang@hkdnr.net.hk>, <fred@cisco.com>, <harald@alvestrand.no>, <hlqian@cnnic.net.cn>, <hoho@iis.sinica.edu.tw>, <htk@eecs.harvard.edu>, <huangk@alum.sinica.edu>, <iab@isi.edu>, <idn@ops.ietf.org>, <iesg@ietf.org>, <jasonho@umac.mo>, <jet-member@nic.ad.jp>, <jseng@pobox.org.sg>, <klensin@jck.com>, <lee@whale.cnnic.net.cn>, <lynn@icann.org>, <mao@cnnic.net.cn>, <mkatoh@mkatoh.net>, <mouhamet@next.sn>, <narten@us.ibm.com>, <nordmark@eng.sun.com>, <paf@cisco.com>, <phoffman@imc.org>, <qhhu@public.bta.net.cn>, <sharil@cmc.gov.my>, <shkyong@kgsm.kaist.ac.kr>, <snw@twnic.net.tw>, <sstseng@twnic.net.tw>, <tsenglm@cc.ncu.edu.tw>, <vcerf@mci.net>, <whzhang@cnnnic.net.cn>, <wschen@twnic.net.tw>, <wuch@gate.sinica.edu.tw>, <yktham@umac.mo>
- References: <5.1.0.14.2.20020206075920.036ed5b0@127.0.0.1>
----- Original Message -----
From: "Dave Crocker" <dhc@dcrocker.net>
> Elisabeth,
>
> This entire topic, and all its proposals, have very much been taken into
> account by the IDN working group. They have been taken into account at the
> cost of many months of delay, although this topic is actually outside the
> scope of the working group.
>
> The topic calls for an algorithm that equates portions of different
> scripts. This goes beyond the model of equating upper/lower case WITHIN a
> script.
No. You are maybe pointing to my half-baked draft "look-alike normaliation + multicase ACE equivalence encoding ACROSS
cyrillic/greek/latin script", not to CDNC's TSCONV-02 draft which attempts to
add TC/SC 1:1 equivalence WITHIN the unified Han script block by borrowing the suggested framework briefed in my pre-draft. That
may let you mix up the two. TSCONV-02 is succeeded by TSCONV-03 that
takes brand new validation-based TC/SC filtering approach.
>
> In fact, this topic is an open research question with no generally accepted
> practise. So even if the topic were within scope the solution would, at
> best, be very, very risky.
>
> The risk is exacerbated by the fact that this technical approach does not
> scale well. As soon as an approach like the TC/SC proposal is added, then
> we must find mappings for many, many other multi-script equivalences. That
> effort will probably take years.
True. There are a huge set of "look-similar" equivalences in Unicode! But, fortunately,
we have a much smaller set of "look-identical" equivalences. for example, the size
of each set of equivalent cyrillic/gree/cherokee/latin characters is relatively small,
and the equivalent pairs are more easily found than 'look-similar' ones.
If we restrict the problem space into the 'look-identical' equivalence,
we will reach the ideal goal faster and we can avoid the scalability problem
in the proposed multicase encoding.
As for "look-similar" characters, we can recommend new disambiguating font sets for
IDN represenations. For LDH domains, we have already some font sets which
have '0' in slashed-zero shape to be easily distinguished from alphabet 'o'.
Soobok Lee
>
> d/
>
>
> ----------
> Dave Crocker <mailto:dcrocker@brandenburg.com>
> Brandenburg InternetWorking <http://www.brandenburg.com>
> tel +1.408.246.8253; fax +1.408.273.6464
>