[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] call for comments for REORDERING
In various messages, lsb@postel.co.kr writes:
> The sizes of Latin alphabets or variants do not exceed 30.
Russian Cyrillic has 33. Many Central European implementations of the Latin
script, with carons and acute accents and such, use quite a bit more than 30.
But this is nit-picking, and Soobok's point about the difference between
Latin and Hangul is well taken.
> If we should change the frequency table, new versioning
> prefix should be introduced to avoid conflicts in the same
> time when there is need to make major modifications to
> NFC/NFKC due to erros in them.
> If version 1 has "dq--" prefix, version 2 should have
> other ones like "xq--". Reordering tables could be
> improved at that time if needed.
I think Martin has already adequately explained the problematic nature of
creating different versions of the whole IDN scheme. Users, of course, will
not generally understand that there is a nameprep part and a reordering* part
and an ACE part and that only the reordering part has changed; they will
simply see that IDN "works" or "doesn't work," and of course they will be
right. (*Note: the ordinary English word "reordering" is spelled here in
normal lowercase letters.)
Requiring everyone to update browsers and server-side software to accommodate
IDN will be an understandable costs for many. Creating a new and
incompatible version of IDN to add reordering for Tagalog -- a really minor
optimization for a small alphabet, by Soobok's own admission -- will not.
> > The idea behind this is that if e.g. Tagalog gets added to
> > Unicode, and the IETF decides to add it to the allowed set
> > of characters for domain names, then the registries that
> > want to accept Tagalog have to update their software
> > immediately (no big deal for them), but deployed software
> > can use Tagalog without having to change nameprep/ACE
> > (unless they use characters which have to be normalized,
> > which may happen but will be rare). So existing clients
> > will already ACE the Tagalog codepoints without reordering,
>
> maybe problematic and unsafe.
> What if future NFC/NFKC maps them into other code points ?
> There will be a mess, too.
You don't have to worry about that because the ISO 10646 and Unicode
committees are firmly committed not to add any new compatibility characters
that would break normalization in this way. They can (and will) add Tagalog
and other scripts, as well as adding new characters for existing scripts, but
you will never see a precomposed Tagalog ligature that would cause NFC or
NFKC to have to be updated. To do so would break existing implementations
and cause the same kind of havoc that "Reordering v2.0" would cause.
-Doug Ewell
Fullerton, California