[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] An experiment with UTF-8 domain names



At 18.42 +0900 01-01-06, Martin J. Duerst wrote:
>- Expecting that every single software component that
>   will deal with internationalized domain names will
>   do name preparation on every processing step is not
>   realistic (Patrick, I'm not saying that you have such
>   expectations, I just want to make sure others don't).

I agree completely with this.

My point in all of this discussion is exactly that, that we _have_ to 
change software regardless of what we decide, to get full IDN 
functionality -- and because of this, we will live in a world where 
people have not upgraded their software yet, so backward 
compatibility is really important.

Some examples I use involve my own name which is

   Patrik H:son Fältström

If you look closely, you will see that "H:son" might be problematic 
to have as a domain part because of the colon, and the 'ä' can be 
written in two ways, which are equal according to the normalization 
forms defined by the Unicode Consortium. One of the ways will 
probably be used in Sweden more (where the 'ä' is a special 
character) and another outside of Sweden (where the 'ä' is an 
accented 'a').

Martin gives another good example which is the full-width and 
half-width characters.

This is not easy, and claiming that "this works already if we choose 
UTF-8 encoded Unicode is too naive.

What I said some mail ago was that "what the encoding of Unicode is, 
UTF-8 or ACE, is the simple part of this puzzle -- and the big 
difference is that an ACE encoding guarantees that the encoded words 
work in the application protocols we have today". I still claim that 
is the case.

    paf