[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] time to move



On Wed, 23 May 2001, Adam M. Costello wrote:

> Finally, note that UTF-8 has issues with encoding efficiency (the
> length of the encoded string).

Perhaps, but I don't think that is an issue for this working group.

We need fixed and unified codings for characters. That is among the
most important issues here, in any IETF work and in computer industry.
We need something which like ASCII is the same everywhere and doesn't
depend on what kind of computer or what protocol you're using.

The major reason for using Unicode/ISO 10646 is to know that 01101001
is always 'i', not only for 'i' but for all other characters as well.
I don't think some of the people here realize what kind of problems
the use of a myriad encodings result in.

But sure, in the case of protocols the problem is a little different.
We can let our computers do the conversions automatically. In theory.
Experience, however, shows that it is too complex for most software to
handle properly. The result is frequent leakage of internal formats as
e g in the case of quoted-printable.

That is why the IETF should specify a single encoding to be used in
all protocols. And - fortunately - IETF already has. BCP 18 (RFC 2277)
clearly specifies that UTF-8 is to be used as the standard character
set in all protocols. It might be argued that UTF-8 was a bad choice,
but this WG is not the place to change it. And until it is changed,
UTF-8 should be at least the long term goal for this WG.

/Magnus


And, btw:

I think "international characters" and "internationalized host names"
are very strange terms to use. What is more international than ASCII?
Wouldn't "local" and "localized" be better?