[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: An argument against multiple character sets

To: Paul Hoffman / IMC <phoffman@imc.org>, idn@ops.ietf.org
Subject: Re: An argument against multiple character sets
From: Harald Tveit Alvestrand <Harald@Alvestrand.no>
Date: Sun, 23 Jan 2000 22:08:49 +0100
Delivery-date: Sun, 23 Jan 2000 14:02:48 -0800
Envelope-to: idn-data@psg.com

At 12:01 23.01.00 -0800, Paul Hoffman / IMC wrote:
>There has been some discussion on this list about whether or not we should 
>allow domain names to be created in different character sets. I believe 
>that there is a simple argument that shows that we can't.
>
>Let's say I want to register a domain name that is two letters: LATIN 
>SMALL LETTER F followed by LATIN SMALL LETTER U WITH OGONEK. If I use ISO 
>8859-4, that would encoded as 0x46F9. So far so good. You see a billboard 
>with my domain name on it, and you enter it into a browser. That browser 
>uses a different character set, let's say Unicode. The browser sends to 
>the resolver 0x00460173.

Note: This is the UTF-16 (or UCS-2) representation of Unicode.

.....

>In short, I don't see how a solution that allows more than one character 
>set, or even more than one encoding, will work. If others have 
>counter-examples, I'm open to hearing them.

Your argument indicates that adding character sets to a list after initial 
implementation is impossible. It doesn't mean that the initial set needs to 
be just one, although a server has to be able to compare strings between 
all the initial character sets - which is clearly a bit simpler if there is 
just one of them.

However, I think the *requirement* you are trying to state is that when a 
domain name is represented as text on paper, the user who thinks he has 
access to suitable input devices for that text should be able to query on 
that string and have returned information about the domain that the text on 
paper was intended to represent.

It's clear by now that we probably can't find a solution that accomplishes 
this for all cases, and we probably can never solve it for the case where 
the producer of the paper version intended to be obfuscating (see the 
argument about C-Omicron-M or C0M versus COM), but the closer we come, the 
better off the users are likely to be.

                             Harald A

--
Harald Tveit Alvestrand, EDB Maxware, Norway
Harald.Alvestrand@edb.maxware.no

Prev by Date: Re: Compatibility requirements
Next by Date: Re: An argument against multiple character sets
Prev by thread: An argument against multiple character sets
Next by thread: Re: An argument against multiple character sets
Index(es):
- Date
- Thread