[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] An experiment with UTF-8 domain names

To: Patrik F$BgM(B tstr$B‹N(B <paf@cisco.com>, idn@ops.ietf.org
Subject: Re: [idn] An experiment with UTF-8 domain names
From: "Martin J. Duerst" <duerst@w3.org>
Date: Sun, 07 Jan 2001 00:24:32 +0900
Delivery-date: Sat, 06 Jan 2001 07:58:44 -0800
Envelope-to: idn-data@psg.com

At 01/01/06 13:27 +0100, Patrik F$BgM(Btstr$B‹N(B wrote:

>My point in all of this discussion is exactly that, that we _have_ to 
>change software regardless of what we decide, to get full IDN 
>functionality -- and because of this, we will live in a world where people 
>have not upgraded their software yet, so backward compatibility is really 
>important.
>
>Some examples I use involve my own name which is
>
>   Patrik H:son F$BgM(Btstr$B‹N(B
>
>If you look closely, you will see that "H:son" might be problematic to 
>have as a domain part because of the colon, and the '$Bg(B can be written in 
>two ways, which are equal according to the normalization forms defined by 
>the Unicode Consortium. One of the ways will probably be used in Sweden 
>more (where the '$Bg(B is a special character) and another outside of Sweden 
>(where the '$Bg(B is an accented 'a').

No, the data will almost always contain the precombined version as it is
used in Sweden. This is independent of what people think about it or
how they may input it, where you can indeed see quite some difference.
So for this case, normalization is mainly needed to avoid surprises
in very rare and strange cases, not to bring too variants together.

>Martin gives another good example which is the full-width and half-width 
>characters.

This is much more of a problem, because both exist side-by-side,
at least in Japan.

>This is not easy, and claiming that "this works already if we choose UTF-8 
>encoded Unicode is too naive.
>
>What I said some mail ago was that "what the encoding of Unicode is, UTF-8 
>or ACE, is the simple part of this puzzle -- and the big difference is 
>that an ACE encoding guarantees that the encoded words work in the 
>application protocols we have today". I still claim that is the case.

Well, let's try with some picture. ACE assures that all cars can pass
all tunnels by squeezing them together in a way that makes them look
like bikes. They pass, but they are not cars anymore, and you have
to be really lucky if somebody can help you to make a car again
out of a bike that came through the tunnel. And even if the tunnels
get wider, we will still squeeze all cars in the future.

Of course, that's only half of the story; it may be worth noting
that in principle, all tunnels nowadays are wide enough for cars,
indeed it's easier to build wide tunnels, but there are some
extremely old tunnels still remaining, and there are still
some engineers that, by accident or whatever, that don't clean
up their tunnels.

Regards,   Martin.

Prev by Date: Re: [idn] What's wrong with skwan-utf8?
Next by Date: Re: [idn] An experiment with UTF-8 domain names
Prev by thread: Re: [idn] An experiment with UTF-8 domain names
Next by thread: Re: [idn] An experiment with UTF-8 domain names
Index(es):
- Date
- Thread