[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: An argument against multiple character sets
- To: Paul Hoffman / IMC <phoffman@imc.org>, idn@ops.ietf.org
- Subject: Re: An argument against multiple character sets
- From: Harald Tveit Alvestrand <Harald@Alvestrand.no>
- Date: Sun, 23 Jan 2000 22:08:49 +0100
- Delivery-date: Sun, 23 Jan 2000 14:02:48 -0800
- Envelope-to: idn-data@psg.com
At 12:01 23.01.00 -0800, Paul Hoffman / IMC wrote:
>There has been some discussion on this list about whether or not we should
>allow domain names to be created in different character sets. I believe
>that there is a simple argument that shows that we can't.
>
>Let's say I want to register a domain name that is two letters: LATIN
>SMALL LETTER F followed by LATIN SMALL LETTER U WITH OGONEK. If I use ISO
>8859-4, that would encoded as 0x46F9. So far so good. You see a billboard
>with my domain name on it, and you enter it into a browser. That browser
>uses a different character set, let's say Unicode. The browser sends to
>the resolver 0x00460173.
Note: This is the UTF-16 (or UCS-2) representation of Unicode.
.....
>In short, I don't see how a solution that allows more than one character
>set, or even more than one encoding, will work. If others have
>counter-examples, I'm open to hearing them.
Your argument indicates that adding character sets to a list after initial
implementation is impossible. It doesn't mean that the initial set needs to
be just one, although a server has to be able to compare strings between
all the initial character sets - which is clearly a bit simpler if there is
just one of them.
However, I think the *requirement* you are trying to state is that when a
domain name is represented as text on paper, the user who thinks he has
access to suitable input devices for that text should be able to query on
that string and have returned information about the domain that the text on
paper was intended to represent.
It's clear by now that we probably can't find a solution that accomplishes
this for all cases, and we probably can never solve it for the case where
the producer of the paper version intended to be obfuscating (see the
argument about C-Omicron-M or C0M versus COM), but the closer we come, the
better off the users are likely to be.
Harald A
--
Harald Tveit Alvestrand, EDB Maxware, Norway
Harald.Alvestrand@edb.maxware.no