[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] proposed i18n naming rules
The fundamental conflict here is that domain names are different from host
names, with different rules having been applied.
STD13 host names are case-neutral US-ASCII alpha, numbers and hyphen,
where hyphen is prohibited from appearing on either end of a label. STD13
domain names MAY contain case-neutral US-ASCII alpha, MAY contain
eight-bit values with no character assignments (and thus no case), and MAY
contain any seven-bit US-ASCII character codes outside of LDH in any
position in the label.
There are very, very few *formal* uses for STD13 domain names, but where
they are used they are very important. One example is SRV entries which
use "_" in the label. Another example is email addresses, where the
left-hand side ("local-part") of the address is stored in a label which is
encapsulated as RR data in SOA and RP RRs.
LHS of email addresses is a great example of the domain name rules. Under
RFC [2]822, the local-part of an email address can contain:
| atext = ALPHA / DIGIT / ; Any character except controls,
| "!" / "#" / ; SP, and specials.
| "$" / "%" / ; Used for atoms
| "&" / "'" /
| "*" / "+" /
| "-" / "/" /
| "=" / "?" /
| "^" / "_" /
| "`" / "{" /
| "|" / "}" /
| "~"
|
| atom = [CFWS] 1*atext [CFWS]
|
| dot-atom = [CFWS] dot-atom-text [CFWS]
|
| dot-atom-text = 1*atext *("." 1*atext)
The local-part of the email address is also case-sensitive.
Cumulatively, the local-part of an email address can be something as
bizarre as "-Eric A. Hall-", and this would be stored in an STD13 domain
name as "-Eric A. Hall-.ehsco.com.", which would be deconstructed by
compliant systems as "-Eric A. Hall-@ehsco.com" (with the case preserved).
Obviously, such usage conflicts with the STD13 hostname rules, and would
also conflict with an IHN ruleset that maintained compatible with the
STD13 hostname rules.
Another condition which has to be watched for is that STD13 allows
eight-bit values to be specified as exact octet sequences. This isn't
specifically a problem with the examples given to date, and there are no
protocol mechanisms which use the eight-bit values (to my knowledge), but
it something to keep in mind.
Nameprep provides several functions which conflict with the usage of STD13
domain names (and thus also conflict with IDNs). The two of these which do
the most harm are US-ASCII character prohibitions (only allowing LDH), and
mandatory case-conversion.
There are several options here.
1) Expand the IHN rules to accomodate a larger ("reasonable") set of
characters, such as "_" for SRV and some common mail characters.
This breaks backwards compatibility, and is not really much of an
option. For example, ACE would not encode the label if it it only
contained characters from the ASCII range 0x00-7F. A label such
as e_x_a_m_p_l_e.com. would be legal under the liberal IHI rules,
would be passed through unencoded to STD13 systems, and then it
would break at those systems.
Furthermore, mandatory case-conversion in the IHI nameprep would
still break email addresses.
2) Define a stringprep profile for IDNs explicitly.
One of the hard part of this approach is deciding on case-
conversion. Life is easier if everything is lowercased but as
shown, this breaks local-parts of email addresses.
Also, there is a potential future problem with normalization of
characters, in that some future version of RFC 2822 may allow for
non-normalized characters in local-parts. Requiring mandatory
normalization would prohibit this usage, or at the least, would
make such usage more complex.
Finally, there is a problem with the eight-bit codes which are
currently unspecified but legal in DNS. These codes map to the
Latin-1 subset, so if they are always mandatorily interpreted and
managed as such and are always normalized and case converted,
then the wrong data will be returned.
3) Redefine the allowable data in DNS.
If we globally say "no more domain names that are not hostnames"
we break SRV and all other entries which can use STD13 domain
names legitimately.
This approach also introduces a requirement for authoritative
servers, including replication masters and DDNS listeners. It
also affects legacy systems that follow the old rules in good
faith (such as DDNS clients), and where the sudden failure
cannot be adequately represented by an RCODE. In short, such a
mandate will require at least some updates to legacy systems.
4) Clarify the protocol usage rules. EG, clarify SRV to be
[service].[transport].<IHI>. This is similar to #3 above but
rather than defining global rules, clarify the handling rules
for each RR type independently.
This is feasible for the situations where this is a known problem
with protocol operations. The problem is that there are a bunch
of them. SOA is used umpteen different ways so it has to be
clarified upteen different times (or at least considered upmteen
different ways before a single clarification is useful).
This is the most work but it generates the smallest wake,
assuming that the handling rules can be clarified so as they do
not break legacy systems. EG, we specifically clarify that the
email addresses in SOA/RP are <IDN>.<IHI>, which doesn't break
anything but it is considerable effort to go through each and
every usage scenario.
I think #4 is the option we should pursue, or that DNSEXT should pursue,
as part of an effort to internationalize the DNS in general. DNS is the
same as every other service in that it has to be updated to fully utilize
internationalized domain names and host names, so we should not be
surprised that some work will be required for it to function properly.
--
Eric A. Hall http://www.ehsco.com/
Internet Core Protocols http://www.oreilly.com/catalog/coreprot/