[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] What's wrong with skwan-utf8?



A terminological quibble here:

>   I guess I still don't get why some people are so focused on UTF-8.  
> UTF-8 is an 8-bit encoding of the UCS.  ACE (whatever flavor) is a 7-bit
> encoding of the UCS. 

UTF-8, UTF-16, and UTF-32 are encoding forms of Unicode (or the UCS,
if you prefer). These have a privileged status in the standard(s), and
are implemented as processing forms of the encoded characters, as
well as interchange forms. People treat UTF-8 streams as streams of
the *characters* themselves, not as cryptographic puzzles to be teased
apart by the appropriate API before the characters can be identified.

ACE, on the other hand, is one of a large class of things that are
referred to as transfer encoding syntaxes in the Unicode Character Model.
It is an explicit reshuffling of the bits to meet the bit-pattern
constraints of one or more protocols that can't handle the encoding
forms per se. Nobody is going to use ACE (or LACE or RACE or *ACE) as
a processing form of the encoded characters, nor will they use ACE
as a generic interchange form for the encoded characters, in any
but the protocols concerned with IDN.

That said, I am not advocating one or the other particularly as
an IDN solution. (I see that the ACE advocates have strong arguments
in their favor.) But you need to understand that UTF-8 and ACE are
not just morally equivalent "encodings" to understand why UTF-8
advocates would be so focussed on it.

--Ken