[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] call for comments for REORDERING



In various messages, lsb@postel.co.kr writes:

> The sizes of Latin alphabets or variants do not exceed 30.

Russian Cyrillic has 33.  Many Central European implementations of the Latin 
script, with carons and acute accents and such, use quite a bit more than 30. 
 But this is nit-picking, and Soobok's point about the difference between 
Latin and Hangul is well taken.

>  If we should change the frequency table, new versioning
>  prefix should be introduced to avoid conflicts in the same
>  time when there is need to make major modifications to 
>  NFC/NFKC due to erros in them. 
>  If version 1 has "dq--" prefix, version 2 should have
>  other ones like "xq--". Reordering tables could be
>  improved at that time if needed.

I think Martin has already adequately explained the problematic nature of 
creating different versions of the whole IDN scheme.  Users, of course, will 
not generally understand that there is a nameprep part and a reordering* part 
and an ACE part and that only the reordering part has changed; they will 
simply see that IDN "works" or "doesn't work," and of course they will be 
right.  (*Note: the ordinary English word "reordering" is spelled here in 
normal lowercase letters.)

Requiring everyone to update browsers and server-side software to accommodate 
IDN will be an understandable costs for many.  Creating a new and 
incompatible version of IDN to add reordering for Tagalog -- a really minor 
optimization for a small alphabet, by Soobok's own admission -- will not.

>  > The idea behind this is that if e.g. Tagalog gets added to
>  > Unicode, and the IETF decides to add it to the allowed set
>  > of characters for domain names, then the registries that
>  > want to accept Tagalog have to update their software
>  > immediately (no big deal for them), but deployed software
>  > can use Tagalog without having to change nameprep/ACE
>  > (unless they use characters which have to be normalized,
>  > which may happen but will be rare). So existing clients
>  > will already ACE the Tagalog codepoints without reordering,
>
>  maybe problematic and unsafe. 
>  What if future NFC/NFKC maps them into other code points ?
>  There will be a mess, too.

You don't have to worry about that because the ISO 10646 and Unicode 
committees are firmly committed not to add any new compatibility characters 
that would break normalization in this way.  They can (and will) add Tagalog 
and other scripts, as well as adding new characters for existing scripts, but 
you will never see a precomposed Tagalog ligature that would cause NFC or 
NFKC to have to be updated.  To do so would break existing implementations 
and cause the same kind of havoc that "Reordering v2.0" would cause.

-Doug Ewell
 Fullerton, California