[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] What's wrong with skwan-utf8?
- To: idn@ops.ietf.org
- Subject: Re: [idn] What's wrong with skwan-utf8?
- From: "D. J. Bernstein" <djb@cr.yp.to>
- Date: Mon, 25 Dec 2000 16:19:53 -0800
- Cc: djb@cr.yp.to
- Delivery-date: Mon, 25 Dec 2000 16:22:19 -0800
- Envelope-to: idn-data@psg.com
[JS: If you wish to carry on a long discussion on this, I would appreciate if
you could join the mailing list. This is the last time I will bounce your mail
back. Thanks]
Rick H Wesson writes:
> there is a lot of embedded systems out there that would crash-and-burn if
> they received a reply in utf8.
Can you please identify the systems, explain how they use domain names,
and say what exactly you mean by ``crash-and-burn''? We need this
information if we're going to accurately assess the cost of upgrading
the world to support IDNs.
Patrik Fältström writes:
> Many implementations of the above protocols happen to be able
> to handle UTF-8, while others can not.
Same question here.
I realize that sendmail removes characters 128-159 on input. Fixing this
is a trivial matter of encoding 128->255 160, 129->255 161, ...,
159->255 191, 255->255 255 in the collect() routine, then reversing the
encoding on output.
I also see a statement in the mailing-list archives that an obsolete
version of the Netscape mailer segfaults under Solaris when it reads
UTF-8 messages. Presumably this bug doesn't exist in Netscape 6.
> Also, there is a question whether UTF-8 is really what we should use.
Many systems use UTF-8 internally. It takes less work for them to read
and write UTF-8 than for them to handle text in other character sets.
Quite a few programs will Just Work(tm) if IDNs are defined as UTF-8,
while they'll have to be upgraded if IDNs are defined any other way.
It's easy to imagine a world where 8859-1, JIS, KOI-8, and so on have
all disappeared in favor of UTF-8. People are doing the UI work needed
to get there; see, e.g., http://www.cl.cam.ac.uk/~mgk25/unicode.html.
Don't you think it will be a bit embarrassing to look around in a UTF-8
world and see that the Internet is using UTF-7?
---Dan