[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] 7CE
- To: idn@ops.ietf.org
- Subject: [idn] 7CE
- From: "D. J. Bernstein" <djb@cr.yp.to>
- Date: 4 Oct 2001 15:26:30 -0000
- Automatic-Legal-Notices: Copyright 2001, D. J. Bernstein. My transmission of this message to you does not constitute a copyright waiver or any other limitation of my rights, even if you have told me otherwise.
- Mail-Followup-To: idn@ops.ietf.org
ASCII is a standard _encoding_ of characters as bytes: in other words, a
function from byte strings to character strings.
ASCII specifies that, for example, the byte string 100 122 45 45 102 111
111 represents the character string dz--foo.
UTF-8 is compatible with ASCII. The UTF-8 function, restricted to ASCII
byte strings, is exactly the ASCII function. 100 122 45 45 102 111 111
means dz--foo under UTF-8, just as it does under ASCII.
In contrast, if an encoding maps 100 122 45 45 102 111 111 to something
other than dz--foo, that encoding is _not_ compatible with ASCII.
``ASCII'' does not mean ``7-bit.'' It is simply not correct to refer to
Q-P-style Unicode encodings as ``ASCII compatible.'' ASCII is not merely
a set of numbers; it assigns _characters_ to those numbers.
The correct phrase is ``7-bit compatible.'' The encoded strings are
compatible with 7-bit channels. That's the point of these encodings.
---Dan