[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] proposed i18n naming rules



Regarding forbidden symbols in DNS, my opioning on this 
is we should look at the other side of the coin:  which set 
of symbols are better understood by the majority users
 of DNS and maybe question about is this set  sufficient 
for IDN too.  Whence we have settled with the rule, we 
can have IDN game going.   This shall not exclude 
others to use i18n naming rules and creates other 
culture. 

Liana

On Thu, 22 Nov 2001 14:34:17 -0600 "Eric A. Hall" <ehall@ehsco.com>
writes:
> 
> The fundamental conflict here is that domain names are different 
> from host
> names, with different rules having been applied.
> 
> STD13 host names are case-neutral US-ASCII alpha, numbers and 
> hyphen,
> where hyphen is prohibited from appearing on either end of a label. 
> STD13
> domain names MAY contain case-neutral US-ASCII alpha, MAY contain
> eight-bit values with no character assignments (and thus no case), 
> and MAY
> contain any seven-bit US-ASCII character codes outside of LDH in any
> position in the label.
> 
> There are very, very few *formal* uses for STD13 domain names, but 
> where
> they are used they are very important. One example is SRV entries 
> which
> use "_" in the label. Another example is email addresses, where the
> left-hand side ("local-part") of the address is stored in a label 
> which is
> encapsulated as RR data in SOA and RP RRs.
> 
> LHS of email addresses is a great example of the domain name rules. 
> Under
> RFC [2]822, the local-part of an email address can contain:
> 
> | atext           =       ALPHA / DIGIT / ; Any character except 
> controls,
> |                         "!" / "#" /     ;  SP, and specials.
> |                         "$" / "%" /     ;  Used for atoms
> |                         "&" / "'" /
> |                         "*" / "+" /
> |                         "-" / "/" /
> |                         "=" / "?" /
> |                         "^" / "_" /
> |                         "`" / "{" /
> |                         "|" / "}" /
> |                         "~"
> | 
> | atom            =       [CFWS] 1*atext [CFWS]
> | 
> | dot-atom        =       [CFWS] dot-atom-text [CFWS]
> | 
> | dot-atom-text   =       1*atext *("." 1*atext)
> 
> The local-part of the email address is also case-sensitive.
> 
> Cumulatively, the local-part of an email address can be something as
> bizarre as "-Eric A. Hall-", and this would be stored in an STD13 
> domain
> name as "-Eric A. Hall-.ehsco.com.", which would be deconstructed by
> compliant systems as "-Eric A. Hall-@ehsco.com" (with the case 
> preserved).
> 
> Obviously, such usage conflicts with the STD13 hostname rules, and 
> would
> also conflict with an IHN ruleset that maintained compatible with 
> the
> STD13 hostname rules.
> 
> Another condition which has to be watched for is that STD13 allows
> eight-bit values to be specified as exact octet sequences. This 
> isn't
> specifically a problem with the examples given to date, and there 
> are no
> protocol mechanisms which use the eight-bit values (to my 
> knowledge), but
> it something to keep in mind.
> 
> Nameprep provides several functions which conflict with the usage of 
> STD13
> domain names (and thus also conflict with IDNs). The two of these 
> which do
> the most harm are US-ASCII character prohibitions (only allowing 
> LDH), and
> mandatory case-conversion.
> 
> There are several options here.
> 
> 1) Expand the IHN rules to accomodate a larger ("reasonable") set of
>    characters, such as "_" for SRV and some common mail characters.
> 
>    This breaks backwards compatibility, and is not really much of an
>    option. For example, ACE would not encode the label if it it only
>    contained characters from the ASCII range 0x00-7F. A label such
>    as e_x_a_m_p_l_e.com. would be legal under the liberal IHI rules,
>    would be passed through unencoded to STD13 systems, and then it
>    would break at those systems.
> 
>    Furthermore, mandatory case-conversion in the IHI nameprep would
>    still break email addresses.
> 
> 2) Define a stringprep profile for IDNs explicitly.
> 
>    One of the hard part of this approach is deciding on case-
>    conversion. Life is easier if everything is lowercased but as
>    shown, this breaks local-parts of email addresses.
>    
>    Also, there is a potential future problem with normalization of
>    characters, in that some future version of RFC 2822 may allow for
>    non-normalized characters in local-parts. Requiring mandatory
>    normalization would prohibit this usage, or at the least, would
>    make such usage more complex.
> 
>    Finally, there is a problem with the eight-bit codes which are
>    currently unspecified but legal in DNS. These codes map to the
>    Latin-1 subset, so if they are always mandatorily interpreted and
>    managed as such and are always normalized and case converted,
>    then the wrong data will be returned.
> 
> 3) Redefine the allowable data in DNS.
> 
>    If we globally say "no more domain names that are not hostnames"
>    we break SRV and all other entries which can use STD13 domain
>    names legitimately.
> 
>    This approach also introduces a requirement for authoritative
>    servers, including replication masters and DDNS listeners. It
>    also affects legacy systems that follow the old rules in good
>    faith (such as DDNS clients), and where the sudden failure
>    cannot be adequately represented by an RCODE. In short, such a
>    mandate will require at least some updates to legacy systems.
> 
> 4) Clarify the protocol usage rules. EG, clarify SRV to be
>    [service].[transport].<IHI>. This is similar to #3 above but
>    rather than defining global rules, clarify the handling rules
>    for each RR type independently.
> 
>    This is feasible for the situations where this is a known problem
>    with protocol operations. The problem is that there are a bunch
>    of them. SOA is used umpteen different ways so it has to be
>    clarified upteen different times (or at least considered upmteen
>    different ways before a single clarification is useful).
> 
>    This is the most work but it generates the smallest wake,
>    assuming that the handling rules can be clarified so as they do
>    not break legacy systems. EG, we specifically clarify that the
>    email addresses in SOA/RP are <IDN>.<IHI>, which doesn't break
>    anything but it is considerable effort to go through each and
>    every usage scenario.
> 
> I think #4 is the option we should pursue, or that DNSEXT should 
> pursue,
> as part of an effort to internationalize the DNS in general. DNS is 
> the
> same as every other service in that it has to be updated to fully 
> utilize
> internationalized domain names and host names, so we should not be
> surprised that some work will be required for it to function 
> properly.
> 
> -- 
> Eric A. Hall                                        
> http://www.ehsco.com/
> Internet Core Protocols          
> http://www.oreilly.com/catalog/coreprot/
>