[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: host names and nameprep (was: Re: [idn] IRIs ought to use internationalized *host* names)
Martin Duerst <duerst@w3.org> wrote:
> - Domain names are whatever can be used on the lookup side of a dns
> query. This includes all kinds of current and potential uses
> besides the core use that people are usually equating with the DNS.
Right.
> - Host names are the names of machines. They are a subset of domain
> names, used in certain queries/records (e.g. A record).
Basically, yes, although there is at least one instance where host names
are not names of machines: RFC 821 (SMTP) uses host names (not general
domain names) in mail addresses, even though the host names might refer
only to MX records, not A records.
So I would say a "host name" is any name conforming to a particular
syntax whose primary purpose is to name machines on the internet, but
which can also be used for naming other things.
> > Proposed repertoire for internationalized *host* labels: All
> > characters in classes L (letter), M (mark), and N (number) are
> > allowed, and U+002D (hyphen-minus) is also allowed. Everything
> > else is forbidden.
>
> This is a very good first shot. There are some things that have to
> be carefully checked, e.g. do some M (marks) have to be excluded, or
> should some signs corresponding to the hyphen-minus be allowed. Two
> examples I know would be the zero-width space which could be desirable
> for Farsi,
But that would mean allowing a white-space character in host names.
There are *lots* of contexts where white-space is a delimiter. Is there
any non-space character that could serve this purpose for Farsi?
> and the (idiographic) middle dot, for which several people in Japan
> have complained that it's not available in XML names.
While I have no doubt that the katakana middle dot would get used in
programming language identifiers in much the same way as underscore,
I suspect that in domain names it would be even less popular than
hyphen-minus. In Latin domain names, hyphen-minus is rarely used, and
Japanese readers are much more accustomed to the lack of word boundaries
than readers of Latin-based languages.
One of the nice things about host names is that they're often easy to
remember and guess, because they're so constrained. Adding more choices
for punctuation characters would erode that advantage.
AMC