[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
An argument against multiple character sets
- To: idn@ops.ietf.org
- Subject: An argument against multiple character sets
- From: Paul Hoffman / IMC <phoffman@imc.org>
- Date: Sun, 23 Jan 2000 12:01:28 -0800
- Delivery-date: Sun, 23 Jan 2000 12:01:10 -0800
- Envelope-to: idn-data@psg.com
There has been some discussion on this list about whether or not we should
allow domain names to be created in different character sets. I believe
that there is a simple argument that shows that we can't.
Let's say I want to register a domain name that is two letters: LATIN SMALL
LETTER F followed by LATIN SMALL LETTER U WITH OGONEK. If I use ISO 8859-4,
that would encoded as 0x46F9. So far so good. You see a billboard with my
domain name on it, and you enter it into a browser. That browser uses a
different character set, let's say Unicode. The browser sends to the
resolver 0x00460173.
There are two problems here:
- The browser *can't* know every possible character set
- Even if it did, it wouldn't know which one to use
Adding a charset tag to the internationalized string in the domain name
doesn't help. There is no way for someone seeing a printed representation
of the internationalized string to know which character set was used; in
this case it could be 8859-4 or Unicode or possibly other character sets
that contain that character.
Even requiring all resolvers to do the conversion doesn't help unless we
list all the possible character sets and never change the list. This
introduces many problems:
- New character sets can't be added later without simultaneously updating
all the resolvers on the Internet to use the added character sets. Such
simultaneous updates are impossible.
- The main reason we are considering more than one character set now is
current politics and desires for favored character sets. We can safely
assume that politics and desires will continue to change and evolve.
- We are forcing resolvers to do much more processing than they are now.
In short, I don't see how a solution that allows more than one character
set, or even more than one encoding, will work. If others have
counter-examples, I'm open to hearing them.
--Paul Hoffman, Director
--Internet Mail Consortium