[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] An experiment with UTF-8 domain names



Mark wrote:

>> So if the applications use the correct normalisation of UTF-8 today they
>need not be changed.
>
>There are many ways to normalize data. NamePrep is designed to provide a
>single, absolutely determinate process for this IDN normalization. The
>chances that the average application would, by chance, duplicate the same
>normalization is minimal.
>

Depends on what you mean with normalisation.

When I talk about normalisation of text, I mean order the character encoding
in a standard way without destroying any important information in the text.
For example Unicode normalisation form C and probably KC is like this.
As ISO 10646 is defined with several alternative ways to encode a
character, you need to normalise it to one way, if you want to
have a simple handling of it. This type of normalisation is needed
to make interoperability really work (because handling all possible
ways of character encoding in your code is a lot of work and makes your
code bigger).

NamePrep defines how to transform the text into a form that can be used
when comparing host names for equvalence using host name semantics.
It destroys important information in the text. This type of "normalisation"
is only needed when you want to compare two host names (and probably
domain names).

If both the above things are going to be called "normalisation" we need
some way to separate the two. Anybody with a good idea for what
to call them?

   Dan