[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] nameprep



Paul Hoffman / IMC <phoffman@imc.org> wrote:

> > Nameprep currently includes the following steps:
> >
> >     KC-normalize
> >     map
> >     KC-normalize
> >     prohibit
> 
> Er, no it doesn't. It only includes map->normalize->prohibit.

Oops, I was misremembering the part about the mapping step being
designed to capture the operation fold->normalize->fold->normalize.

Anyway, if nameprep is map->normalize->prohibit, that's equivalent to:

    map
    compatible-decompose
    canonical-compose
    prohibit

We can still ask whether it would be convenient to swap the first two
steps.  It would make the mapping table smaller.

Also, I continue to be nervous about inviting all the weird characters
into domain names right away.  Maybe applications should be liberal
in what the allow in domain names, but DNS servers should be more
conservative, at least in the beginning.  People might want to look at
ISO/IEC TR 10176:1998, or equivalently, Annex D of the C99 spec (ISO/IEC
9899:1999), which lists the set of characters allowed in C identifiers.
Another set to look at is the name characters of XML, which are defined
in terms of the Unicode character classes:

http://www.w3.org/TR/2000/REC-xml-20001006#sec-common-syn                       
http://www.w3.org/TR/2000/REC-xml-20001006#CharClasses                          

AMC