[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Just send UTF-8 with nameprep (was: RE: [idn] Reality Check)



Recently Keith Moore wrote:

	ACE is an *encoding" just like UTF-8 is an *encoding*.

I thought about responding, but Ken wrote something last January in the
discussion of the skwan-utf8 draft that apparently needs reposting.

Eric

------- Forwarded Message

Date: Thu, 4 Jan 2001 11:14:25 -0800 (PST)
From: Kenneth Whistler <kenw@sybase.com>
Message-Id: <200101041914.LAA24879@birdie.sybase.com>
To: briansp@walid.com
Subject: Re: [idn] What's wrong with skwan-utf8?
Cc: idn@ops.ietf.org
X-Sun-Charset: US-ASCII
Sender: owner-idn@ops.ietf.org
Precedence: bulk

A terminological quibble here:

>   I guess I still don't get why some people are so focused on UTF-8.  
> UTF-8 is an 8-bit encoding of the UCS.  ACE (whatever flavor) is a 7-bit
> encoding of the UCS. 

UTF-8, UTF-16, and UTF-32 are encoding forms of Unicode (or the UCS,
if you prefer). These have a privileged status in the standard(s), and
are implemented as processing forms of the encoded characters, as
well as interchange forms. People treat UTF-8 streams as streams of
the *characters* themselves, not as cryptographic puzzles to be teased
apart by the appropriate API before the characters can be identified.

ACE, on the other hand, is one of a large class of things that are
referred to as transfer encoding syntaxes in the Unicode Character Model.
It is an explicit reshuffling of the bits to meet the bit-pattern
constraints of one or more protocols that can't handle the encoding
forms per se. Nobody is going to use ACE (or LACE or RACE or *ACE) as
a processing form of the encoded characters, nor will they use ACE
as a generic interchange form for the encoded characters, in any
but the protocols concerned with IDN.

That said, I am not advocating one or the other particularly as
an IDN solution. (I see that the ACE advocates have strong arguments
in their favor.) But you need to understand that UTF-8 and ACE are
not just morally equivalent "encodings" to understand why UTF-8
advocates would be so focussed on it.

- --Ken



------- End of Forwarded Message