[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: An argument against multiple character sets



Hello Bill,

I just tried with Opera 3.61, but it didn't work. Did I miss some setting,
or what?

Regards,   Martin.

At 16:44 00/01/23 -0500, J. William Semich wrote:
> Hello;
> 
> I'm assuming you are using "character set" interchangeably with "encoding"
> below... 
> 
> At 12:01 PM 1/23/00 -0800, Paul Hoffman / IMC wrote:
> >There has been some discussion on this list about whether or not we should 
> >allow domain names to be created in different character sets. I believe 
> >that there is a simple argument that shows that we can't.
> >
> >Let's say I want to register a domain name that is two letters: LATIN SMALL 
> >LETTER F followed by LATIN SMALL LETTER U WITH OGONEK. If I use ISO 8859-4, 
> >that would encoded as 0x46F9. So far so good. You see a billboard with my 
> >domain name on it, and you enter it into a browser. That browser uses a 
> >different character set, let's say Unicode. The browser sends to the 
> >resolver 0x00460173.
> >
> >There are two problems here:
> >- The browser *can't* know every possible character set
> >- Even if it did, it wouldn't know which one to use
> 
> Exactly! <smile>
> 
> That's why Microsoft has adopted UTF-8 (UNICODE) as its "standard" default
> configuration, both in IE5 and in Windows 2000 DNS. And once Netscape
> adopts the same default "standard", both browsers will only (or, primarily)
> send UTF-8 queries to the resolver. Our customers tell us other browsers
> (such as Opera) can also resolve our test UTF-8 test URLs
> 
> How to best modify BIND in order for it to be able to deal with all this is
> probably much more important to this discussion than deciding which
> encoding should be set as the standard, IMO. UTF-8 looks pretty "standard"
> already, from the client/user point of view, at least. 
> 
> I'm not saying I think this "unofficial" working group should just bless
> UTF-8. I'm saying the more important work is in developing standards for
> upgrading BIND.
> 
> 
> -- Bill Semich
> .NU Domain
> 
> >
> >Adding a charset tag to the internationalized string in the domain name 
> >doesn't help. There is no way for someone seeing a printed representation 
> >of the internationalized string to know which character set was used; in 
> >this case it could be 8859-4 or Unicode or possibly other character sets 
> >that contain that character.
> >
> >Even requiring all resolvers to do the conversion doesn't help unless we 
> >list all the possible character sets and never change the list. This 
> >introduces many problems:
> >- New character sets can't be added later without simultaneously updating 
> >all the resolvers on the Internet to use the added character sets. Such 
> >simultaneous updates are impossible.
> >- The main reason we are considering more than one character set now is 
> >current politics and desires for favored character sets. We can safely 
> >assume that politics and desires will continue to change and evolve.
> >- We are forcing resolvers to do much more processing than they are now.
> >
> >In short, I don't see how a solution that allows more than one character 
> >set, or even more than one encoding, will work. If others have 
> >counter-examples, I'm open to hearing them.
> >
> >--Paul Hoffman, Director
> >--Internet Mail Consortium
> >
> >
> >
> Bill Semich
> President and Founder
> .NU Domain Ltd
> http://whats.nu
> bill@mail.nic.nu
> 
> 
> 


#-#-#  Martin J. Du"rst, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org