[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] conflicts with ACE and STD13
- To: IDN <idn@ops.ietf.org>
- Subject: [idn] conflicts with ACE and STD13
- From: "Eric A. Hall" <ehall@ehsco.com>
- Date: Fri, 09 Nov 2001 06:30:07 -0600
- Organization: EHS Company
Three potential conflicts with ACE and STD13 labels:
1) Easy one first. There is a potential security problem with ACE
encodings of legacy LDH domains, in that it may be possible for a user to
manually encode an LDH label and provide false glue by providing
"bq--ehsco.com." which gets decoded as "ehsco.com.", particularly if a
delegating entity doesn't prevent it. idna-02 says this is illegal for
zones in particular, but it needs to happen anywhere that ACE is processed
as rich data rather than LDH. We should just declare any ACE encoded LDH
label as illegal to be rejected with extreme prejudice by any entity which
encodes OR decodes ACE. I'm putting this in the next UDNS spec, btw.
2) ACE precludes certain characters from being stored, and delegates some
of this process to idna's incoming filters. However, idna is only
concerned with host names, and some of the excluded values can be provided
as binary domain names (hyphen at the beginning and end of a binary domain
name is legal, for example). Will such strings blow up ACE? If so, these
labels need to be excluded from ACE conversion the same way that LDH
labels are. Can we get an enumeration on these?
3) Similar problem exists with domain names that contain eight-bit
characters outside LDH. This is a complex problem, so read all the way
through before you start composing a response. :)
Eight-bit character codes like 0xC3 are valid for use in internationalized
domain names (stored as the canonical UCS character representation) AND in
binary domain names (not for host identifiers, but for other uses such as
identifying a TXT RR).
Let us also assume that we have an IDNA compliant DNS server which is
aware of the various STD13 rules, and that it also generates ACE encoded
data based on the canonical UCS character values. As was mentioned in
problem (1) above, if a provided domain name consists of only alphabetic
characters from US-ASCII, then the server knows that an ACE encoded
equivalent of that name would be illegal, so it only generates the STD13
octet encoded output. In this case, a single input of code point value is
sufficient to determine the appropriate output format. Likewise, any UCS
characters entered with a value greater than U+0FF can only be represented
as ACE, so that's the only output format it has to worry about.
However, whenever a UCS character in the range of U+0000 through U+00FF is
provided, the software has to generate two output formats: one for the
STD13 octet encoding required for binary domain names, and another to
provide the ACE encoded representation of the canonical UCS character.
Now we have two encodings of the same UCS character: one is the STD13
octet encoding, and the other is the ACE encoding, and both forms need to
be maintained. This is actually not much of a problem by itself, because
the server software can choose which format to return based on the
encoding that was used by any matching query. EG, if the query was for
m[c3].domain.dom. and was provided with an ACE prefix, then the requester
wants the ACE form. Meanwhile, queries for m[c3].domain.dom. which do not
use an ACE prefix obviously want the binary octet form. When the query
explicity identifies the ENCODING it wants, the server has enough data to
make an intelligent decision.
However, there IS a conflict if the server doesn't have the second piece
of information. For example, let's create another domain name, which we'll
call alias.domain.dom. which is a CNAME for the UCS canonical
representation of "m[c3].domain.dom.", and which is stored in the
UCS-enabled server's UCS-enabled zone. But which output encoding does it
map to? When a query comes in for alias.domain.dom., does the user want
the m[c3] octet encoded TXT RR or do they want the international m[c3] RR
specified by the UCS character? Because the cname entry doesn't provide
any hints, the server can't tell which sequence to return.
This problem MUST be resolved, as it represents a fundamental conflict
between ACE and the binary label syntax whenever queries are processed
using their canonical UCS character code, and where multiple output
encodings are possible or required. At the least we need to say that if
there is a conflict, rename one of the labels. Even then, automatic
generation in only one format will result in wrong data for one of them.
--
Eric A. Hall http://www.ehsco.com/
Internet Core Protocols http://www.oreilly.com/catalog/coreprot/