[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Some comments
- To: idn@ops.ietf.org
- Subject: Re: [idn] Some comments
- From: "D. J. Bernstein" <djb@cr.yp.to>
- Date: 13 Jan 2001 23:57:01 -0000
- Delivery-date: Sat, 13 Jan 2001 15:58:21 -0800
- Envelope-to: idn-data@psg.com
- Mail-Followup-To: idn@ops.ietf.org
- User-Agent: Mutt/1.2.5i
Patrik writes:
> I don't want 8-bit clean protocols, and UTF-8. I want protocols which
> can handle UCS-2, UCS-4 or UTF-16
UTF-8 is compatible with ASCII. UCS-4 is not.
Switching a protocol to UTF-8 preserves compatibility; ASCII data is
unaffected. Switching a protocol to UCS-4 destroys compatibility; ASCII
bytes turn into 4-byte sequences.
Do you seriously believe that the Internet is going to move to UCS-4
rather than UTF-8? Exactly what benefits is anyone supposed to see in
this? Occasionally I hear people claiming that a string of Unicode
numbers is convenient because the displayed width of the string is
proportional to the number of bytes; but they're wrong, thanks to
combining characters and double-width characters.
> What I don't like in your arguments is the focus on "applications"
> instead of looking at what is specified in the protocols.
What I don't like in your arguments is the focus on protocol specs as
religious objects, rather than as tools to help implementors provide
working software to system administrators and users.
> You talk about quoted-printable encoding and charset parameter as it
> was something which destroys the content as shredding of luggage
> would do.
``The luggage isn't destroyed, sir. It's just shredded. As I said, you
can take some time and sew it back together. You haven't lost anything;
it's all here! By the way, would you like to buy a sewing machine?''
> But, software doesn't handle 8 bit stuff correctly today even though
> we have in the IETF been talking about it for a very long time.
The IETF should have required 8-bit-clean mail software in 1982. Allman
would have fixed his software eventually, certainly before version 6.57
in 1993. UTF-8 header fields should have been allowed in 1996. Everyone
would have been happily using UTF-8 mail in 2001.
Instead, the IETF mail standards _still_ allow MTAs to drop 8-bit bytes,
even in message bodies. See http://cr.yp.to/docs/8bit/06.txt.
---Dan