[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
host names and nameprep (was: Re: [idn] IRIs ought to use internationalized *host* names)
Hello Adam,
Sorry for the delay.
I'm splitting my answer into two. This one is on the
host name vs. domain name question.
At 03:30 02/03/27 +0000, Adam M. Costello wrote:
>James Seng/Personal <jseng@pobox.org.sg> wrote:
>
> > The discussion of the how URL is to be encoded and how Host: field are
> > to be handled is probably more relevant so lets get back to that.
Just to make sure that I don't get something wrong:
- Domain names are whatever can be used on the lookup side
of a dns query. This includes all kinds of current and potential
uses besides the core use that people are usually equating with
the DNS.
- Host names are the names of machines. They are a subset of
domain names, used in certain queries/records (e.g. A record).
>Okay. Eventually this message will arrive at the following proposal:
>
> Proposed repertoire for internationalized *host* labels: All
> characters in classes L (letter), M (mark), and N (number) are
> allowed, and U+002D (hyphen-minus) is also allowed. Everything else
> is forbidden.
This is a very good first shot. There are some things that have
to be carefully checked, e.g. do some M (marks) have to be excluded,
or should some signs corresponding to the hyphen-minus be allowed.
Two examples I know would be the zero-width space which could be
desirable for Farsi, and the (idographic) middle dot, for which
several people in Japan have complained that it's not available
in XML names.
>Which characters should be allowed in internationalized host labels?
>This is an interesting question in its own right, and it's possible that
>the IESG will demand an answer.
>Notice that there is no conflict with Nameprep, because Nameprep does
>not prohibit any characters in classes L, M, or N.
I guess that if there were a conflict, the host names would
just have to satisfy conditions on both sides.
>If we were to adopt this definition of internationalized host name, it
>would best be understood as an amendment of ToASCII step 3 (which checks
>host name restrictions if applicable), tightening substep 3a from:
>
> (a) Verify the absence of non-LDH ASCII code points; that is,
> the absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
>
>to:
>
> (a) Verify that the sequence contains only host code points;
> that is, U+002D (hyphen-minus) and code points classified
> as L (letter), M (mark), or N (number). See appendix ? for
> an enumeration of host code points.
>
>Or maybe the enumeration would go in Nameprep, or in a separate document
>that defines internationalized host names.
Looking back on when working on nameprep as a member of the design
team, I think the distinction between host names and domain names
wasn't clear, at least to me, and probably to several other
participants. At some point, I started to worry that having all
the symbols allowed might not have been the best choice. Of course,
if it's for domain names, then that's a bit different.
Regards, Martin.