[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration




----- Original Message -----
From: "Dave Crocker" <dhc@dcrocker.net>
> Elisabeth,
>
> This entire topic, and all its proposals, have very much been taken into
> account by the IDN working group.  They have been taken into account at the
> cost of many months of delay, although this topic is actually outside the
> scope of the working group.
>
> The topic calls for an algorithm that equates portions of different
> scripts.  This goes beyond the model of equating upper/lower case WITHIN a
> script.

No. You are maybe pointing to my half-baked draft "look-alike normaliation + multicase ACE equivalence encoding ACROSS
cyrillic/greek/latin script",  not to CDNC's TSCONV-02 draft which attempts to
add  TC/SC 1:1 equivalence WITHIN the unified Han script block by borrowing the suggested framework briefed in my pre-draft. That
may let you mix up the two. TSCONV-02 is succeeded by TSCONV-03 that
takes brand new validation-based TC/SC filtering approach.


>
> In fact, this topic is an open research question with no generally accepted
> practise.  So even if the topic were within scope the solution would, at
> best, be very, very risky.
>
> The risk is exacerbated by the fact that this technical approach does not
> scale well.  As soon as an approach like the TC/SC proposal is added, then
> we must find mappings for many, many other multi-script equivalences.  That
> effort will probably take years.

True. There are a huge set of "look-similar" equivalences in Unicode!  But, fortunately,
we have a  much smaller set of "look-identical" equivalences. for example, the size
of each set of equivalent cyrillic/gree/cherokee/latin characters is relatively small,
and the equivalent pairs are more easily found   than 'look-similar' ones.
If we restrict the problem space into the 'look-identical' equivalence,
we will reach the ideal goal  faster and we can avoid the scalability problem
in the proposed multicase encoding.

As for "look-similar" characters, we can recommend new disambiguating font sets for
IDN represenations. For LDH domains, we have already some font sets which
have '0' in slashed-zero shape to be easily distinguished from alphabet 'o'.

Soobok Lee

>
> d/
>
>
> ----------
> Dave Crocker  <mailto:dcrocker@brandenburg.com>
> Brandenburg InternetWorking  <http://www.brandenburg.com>
> tel +1.408.246.8253;  fax +1.408.273.6464
>