[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] UNIX moving to UTF-8



Keith Moore said:

> > People are clearly moving to Unicode. Exactly which UTF they choose (8, 16,
> > 32) is not as important, since they all can be converted to each other very
> > efficiently and without loss.
> 
> And since an ACE is just another encoding of Unicode, you can add ACE to
> that set.

This was malarkey the first time it was stated, and it is still malarkey.

UTF-8, UTF-16, and UTF-32 are official, standard encoding forms of the
Unicode character encoding.

ACE is a transformation encoding syntax (TES) -- *not* an encoding form.

> The fact that ACE doesn't happen to use the 0x80 bit doesn't strike me
> as a particularly good reason to rule it out  - especially when (for a 
> carefully-designed ACE) the encoding will often be more efficient than 
> either UTF-16 or UTF-8.
> 
> If UTF-8 is optimized for ASCII-compatibility, the C language (so
> that for instance NUL-terminated strings still work), and to
> minimize the amount of state that must be maintained while decoding; 
> ACE can be optimized for space-efficiency, compatibility with protocols 
> that were designed for ASCII-only DNS names, and (with nameprep) 
> ease of comparison.  It's not as if one is right and the other wrong.
> They are different encodings to accomodate different sets of 
> transition issues.

I'm not opposed to the ACE proposals for solving the short-term IDN
issue, but...

UTF-8 is a general-purpose encoding form for open interchange of
Unicode data is all kinds of contexts. It is also used as a general-purpose
processing form in many systems.

ACE is *not* general-purpose. It is a custom-tailored TES designed to
meet the constraints of a particular protocol, and its use is very
likely to be limited to that particular context.

Arguing for an ACE-based solution for the IDN problem by rhetorically
blurring the distinction does not seem to be a useful way forward to
me.

--Ken