[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: host names and nameprep (was: Re: [idn] IRIs ought to use internationalized *host* names)
Roozbeh Pournader <roozbeh@sharif.edu> wrote:
> I had a lot of debates with Persian experts here in Iran about ZWNJ in
> domain names.
Oh, I thought we were talking about zero-width space. Zero-width
non-joiner is not a space character (class Zs), but a formatting
character (class Cf). Although ASCII host names do not allow characters
of broad-class C, the only such characters that exist in ASCII are
control characters (class Cc). So it's not obvious whether any/some/all
characters of class Cf should be allowed in host labels.
Let's see what nameprep does with characters of class Cf...
prohibited:
070F SYRIAC ABBREVIATION MARK
180E MONGOLIAN VOWEL SEPARATOR
200E..200F [BIDI FORMATTING]
202A..202E [BIDI FORMATTING]
206A..206F [SWAPPING/SHAPING]
FFF9..FFFB [INTERLINEAR ANNOTATIONS]
1D173..1D17A [MUSICAL SYMBOLS]
E0001 LANGUAGE TAG
E0020..E007F [TAGGING CHARACTERS]
mapped out:
200C ZERO WIDTH NON-JOINER
200D ZERO WIDTH JOINER
FEFF ZERO WIDTH NO-BREAK SPACE
unassigned:
2060 WORD JOINER
2061 FUNCTION APPLICATION
2062 INVISIBLE TIMES
2063 INVISIBLE SEPARATOR
kept:
06DD ARABIC END OF AYA
What makes 06DD special? It was class Me in Unicode 3.1, and became
class Cf in Unicode 3.2, but nameprep is based on Unicode 3.1.
Anything prohibited by nameprep should also be prohibited in host
labels. The question is which, if any, of the Cf characters allowed
by nameprep (U+06DD, U+200C, U+200D, U+FEFF) should be allowed in host
labels.
AMC