[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] UNIX moving to UTF-8
Keith Moore said:
> > People are clearly moving to Unicode. Exactly which UTF they choose (8, 16,
> > 32) is not as important, since they all can be converted to each other very
> > efficiently and without loss.
>
> And since an ACE is just another encoding of Unicode, you can add ACE to
> that set.
This was malarkey the first time it was stated, and it is still malarkey.
UTF-8, UTF-16, and UTF-32 are official, standard encoding forms of the
Unicode character encoding.
ACE is a transformation encoding syntax (TES) -- *not* an encoding form.
> The fact that ACE doesn't happen to use the 0x80 bit doesn't strike me
> as a particularly good reason to rule it out - especially when (for a
> carefully-designed ACE) the encoding will often be more efficient than
> either UTF-16 or UTF-8.
>
> If UTF-8 is optimized for ASCII-compatibility, the C language (so
> that for instance NUL-terminated strings still work), and to
> minimize the amount of state that must be maintained while decoding;
> ACE can be optimized for space-efficiency, compatibility with protocols
> that were designed for ASCII-only DNS names, and (with nameprep)
> ease of comparison. It's not as if one is right and the other wrong.
> They are different encodings to accomodate different sets of
> transition issues.
I'm not opposed to the ACE proposals for solving the short-term IDN
issue, but...
UTF-8 is a general-purpose encoding form for open interchange of
Unicode data is all kinds of contexts. It is also used as a general-purpose
processing form in many systems.
ACE is *not* general-purpose. It is a custom-tailored TES designed to
meet the constraints of a particular protocol, and its use is very
likely to be limited to that particular context.
Arguing for an ACE-based solution for the IDN problem by rhetorically
blurring the distinction does not seem to be a useful way forward to
me.
--Ken