[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: An argument against multiple character sets
- To: idn@ops.ietf.org
- Subject: Re: An argument against multiple character sets
- From: Paul Hoffman / IMC <phoffman@imc.org>
- Date: Sun, 23 Jan 2000 14:55:06 -0800
- Delivery-date: Sun, 23 Jan 2000 14:56:56 -0800
- Envelope-to: idn-data@psg.com
At 10:08 PM 1/23/00 +0100, Harald Tveit Alvestrand wrote:
>Note: This is the UTF-16 (or UCS-2) representation of Unicode.
UTF-16BE, to be exact. Kinda near and dear to my heart right now.
>Your argument indicates that adding character sets to a list after initial
>implementation is impossible.
That's one argument, yes, but not the only one.
> It doesn't mean that the initial set needs to be just one, although a
> server has to be able to compare strings between all the initial
> character sets - which is clearly a bit simpler if there is just one of them.
I don't think that is even enough. Without labelling the query from the
user to the resolver with the character set and encoding, how would the
resolver know whether a request with 0x46F9 was LATIN SMALL LETTER F
followed by LATIN SMALL LETTER U WITH OGONEK (8859-4) or LATIN SMALL LETTER
F followed by HEBREW LETTER SHIN (8859-8)?
>However, I think the *requirement* you are trying to state is that when a
>domain name is represented as text on paper, the user who thinks he has
>access to suitable input devices for that text should be able to query on
>that string and have returned information about the domain that the text
>on paper was intended to represent.
In the absence of a single character set and encoding, yes. It also puts
much more load on the resolver, which now needs to be able to translate
from every encoding that might come from a user to every encoding that
might be used in the domain name.
--Paul Hoffman, Director
--Internet Mail Consortium