[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] proposals and deadlines

To: "Adam M. Costello" <amc@cs.berkeley.edu>
Subject: Re: [idn] proposals and deadlines
From: "Eric A. Hall" <ehall@ehsco.com>
Date: Sun, 15 Jul 2001 10:09:10 -0500
CC: IDN <idn@ops.ietf.org>
Organization: EHS Company
"Adam M. Costello" wrote:
> 
> "Eric A. Hall" <ehall@ehsco.com> wrote:
> 
> > Even though we could do this [put ACE conversions in the resolver]
> > for the DNS parts, the applications will still have to perform
> > conversions.
> 
> Are you resigned to have applications contain ACE conversion code
> forever?

No, only until ACE can be deprecated. This will happen on a per-app basis
where it happens, or with entirely new applications that avoid using ACE
altogether. But when it happens, the app should end up with a native UTF8
environment, rather than being stuck with ACE.

> On the other hand, if you'd like to see applications evolve away
> from ACE (by upgrading old protocol formats, or replacing old
> protocols by new protocols), then you need to give them a way to avoid
> ACE in DNS lookups too, and that means giving them a UTF-8 resolver
> interface that makes a best effort, not one that gives up too easily
> and expects the application to do the fallback to ACE.

The application *MUST* do the conversion, since protocol operations may
require it.

The example I gave with UTF8 envelopes shows this. If the envelope is
allowing UTF8 IDNs, then it also makes sense that the left-hand-side of
the address (user@) would be encoded in UTF8 rather than using MIME
charset tags. If the option-negotiation succeeded, the mailer would send
UTF8@UTF8, but if not then it would have to send MIME@ACE. Simply put,
some extensions will affect multiple elements and not just DNS, and the
application is the only thing that will know what to do.

This is also the case with DNS lookups. If the POP3/SMTP mailer is
configured with an IDN for the mail server -- and if that lookup fails
(resolver pukes on API, server returns FORMERR, or whatever) -- then the
client knows that it does not have access to an UTF8 environment, and it
can use this information to make subsequent decisions for the remainder of
that session. It will know that it should not attempt to negotiate
UTF8-envelope, for example.

Another example would be X.509 certificates; the resolver could break
things if it returned a UTF8 IDN that didn't match the domain name
provided with the certificate. This would also be important for HTTP where
negotiation would affect URL inputs and redirect outputs; the server must
not send UTF8 IDNs to a client that is not capable of dealing with them if
the client is stuck with an old resolver.

In any event, the application *MUST* perform the conversion, since it is
the only thing that knows all of the ramifications from failure.

You are also working on the assumption that fallback will happen
frequently. I posit that this is not true, for two basic reasons:

  1) If an initial IDN lookup (or test binding) fails, the app
     finds out immediately, and can use ACE for the duration
     of the session. There would be no need for the app to
     negotiate over extended features, no need for the app to
     try issuing any additional UTF8 IDN queries, etc.

  2) Server failure is not to be expected. Nobody should be
     providing IDNs that are not accessible via the IDN
     protocol. What would be the point? Maybe some domain name
     speculators who have not upgraded their servers will
     cause some failures, but legitimate orgs who are trying
     to embrace IDNs will upgrade their infrastructure
     accordingly. Cumulatively, there should be very few
     server-side failures. [Further, where this does happen,
     the client will find out about it and can stop issuing
     UTF8 IDN queries for that session.]

Taken together, very few DNS-level fallback queries should occur. There is
no great penalty. The only measurable penalty in this model is that zone
and cache sizes increase, and this probably isn't much of one (my research
for another effort indicates that memory consumption is not an issue for
the vast majority of servers and/or caches).

> > There is also a technical consideration here, in that this approach
> > does not provide a way to distinguish between spurious 8bit data and
> > IDNs.  An older application that encounters an 8bit domain name may
> > pass it into the stream when it shouldn't (there is at least one
> > application out there now that passes UTF8 into the DNS).  With the
> > dual-message model, at least we can tell that this is an old client
> > and the malformed data can be compared to LDH and rejected, while
> > EDNS messages should only be coming from modernized applications that
> > (hopefully) follow the normalization rules.
> 
> I don't see how this conflicts with anything I've proposed. I have not
> proposed putting 8-bit data into old-style DNS messages.

DNS allows any 8-bit data. There are already implementations that pass
8-bit data in domain names, and the only reason they don't do it
frequently is because IDNs aren't commonly found in the wild yet. Once
IDNs start showing up, these legacy apps will pass 8-bit names more
frequently. It can be expected that at least a few legacy apps will pass
IDNs as rendered (ISO-8859-nn, Windows-nnnn, MacRoman, whatever) instead
of converting them to UTF8 first. At the very least, they will *not* run
them thruogh nameprep first.

The use of an alternative API provides a way to segregate the queries at
the source (the calling application). If 8-bit data comes into the system
using the legacy APIs, it will also use the legacy message, and can be
compared at the zone authoritatively. Conversely, if it comes into the
system using the new APIs and EDNS message, then we are safe in assuming
that the data is UTF8, and that it has been generated according to
application-specific rules (rather than coming from spurious inputs), and
that it has been run through nameprep. In order for this to work, the
messages must somehow be flagged at the server.

Furthermore, this flag must also exist in order for the server to
intelligently handle IDNs in ASCII names. EG, an organization that has
obtained an IDN may want to setup a CNAME for the old ASCII name that
points to an IDN, and the server needs a way to know that it is safe to
return the IDN in the CNAME response. A flag of some kind is required for
this. Reconverting ACE at the client side will not work for all RRs.

The use of a new API allows IDNs to be flagged, but it is the use of a new
and different message that is what's really important. Your proposal does
not provide this differentiator, and allows corrupt data to enter the
query chain on the same footing as legitimate data.

Furthermore, as pointed out earlier, there is no great "I'm always
failing!" penalty, since applications discover failure quickly, and use
this information to make subsequent decisions. If they know they don't
have access to UTF8 IDNs, they will not negotiate the use of UTF8
protocol-specific extensions, and will not generate UTF8 queries.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/
Prev by Date: Re: [idn] IDN security and ACE leakage
Next by Date: [idn] Why follow IDNA with UTF-8?
Prev by thread: Re: [idn] proposals and deadlines
Next by thread: Re: [idn] proposals and deadlines
Index(es):
- Date
- Thread