[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] I-D ACTION:draft-ietf-idn-nameprep-07.txt
"Eric A. Hall" <ehall@ehsco.com> wrote:
> If you are going to be moving SOME of the prohibited characters from
> nameprep to the IDNA hostname processing, then you need to move ALL of
> them at that stage.
There are some code points that should be prohibited in all
internationalized textual domain names. The private use code points,
noncharacter code points, surrogate codes, and left-to-right mark are
some of the best examples of such code points. These prohibitions do
indeed belong in nameprep.
There are some code points that are prohibited in host names, but not in
all textual domain names. The underscore is the best example. These
prohibitions belong in ToASCII.
It's not always clear which side of the line particular code points fall
on. The least clear-cut are the ASCII prohibitions: 0..20 and 7F. Feel
free to offer some arguments.
> At the very least, you should consolidate the prohibited characters
> into IDNA, as the prohibited characters which appear to be in nameprep
> are in fact valid for STD13 domain names.
In the broadest sense a STD13 domain name can contain arbitrary binary
data. Nameprep is not intended for domain names in this broadest sense.
It is intended for domain names composed of internationalized *text*.
It is appropriate for nameprep to prohibit things that are difficult to
interpret as text.
Therefore, the prohibition of ASCII control characters doesn't worry me
much. The prohibition of ASCII space worries me a little. Notice that
nameprep doesn't prohibit dots in domain labels, even though dots are
usually used to delimit labels, are are therefore tricky to put into
labels. In fact RFC 1035 shows how to get dots into labels for the
purpose of representing email addresses that contain dots in the local
part. If nameprep doesn't prohibit dots, why should it prohibit spaces,
which are also allowed in email address local parts?
I support the prohibition of all other whitespace characters, because
it would be nasty to distinguish between different kinds of whitespace,
but I'm not so confident about the prohibition of ASCII space. That one
could stand some more scrutiny.
AMC