[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] I don't want 8-bit failures in 2011



> > > It is clearly within the scope of IDN to consider the
> > > interoperability issues here---as illustrated by the Sendmail
> > > entry in my list of upgrades necessary for UTF-8.
> > 
> > of course.  after all, it's this very set of interoperability isssues
> > that makes ACE solutions so compelling.
> 
> I'm not taking sides in the debate but I would like to point out that by
> not making forward compatibility an explicit extension, the use of ACE and
> a legacy-friendly message practically guarantees that problems with mixed
> encodings will go up. Legacy apps and resolvers that encounter UTF8 names
> in documents and links will use the UTF8 domain names even though they
> should not do so. As IDNs are promulgated, this problem will get worse,
> not better.

I am not sure I agree with the last sentence, but it's clear to me that
even if we adopt ACE

(a) leakage of non-ACEd UTF-8 IDNs will occur
    (as well as leakage of IDNs in other charsets), and
(b) it's desirable for applications protocols to define what should 
    happen when they do encounter such names.

And while I could argue that applications that detect such names should 
signal these as error conditions (on the grounds that the earlier 
in the signal path such errors are detected, the more likely they are
to be corrected), realistically I suspect that there will be tremendous
pressure on most applications to "fix things up later".  

At least UTF-8 stings of more than a few characters can generally be 
detected even without a label, so there's less likely to be a 
misinterpretation of a UTF-8 IDN than of an IDN in some other 
charset without a label.

So for example in email, even though ACE might be required in SMTP
and in message headers, we'll see UTF-8 in both places, and we'll
be better off if MTAs and UAs that do try to handle these things 
do so in a uniform way.  

> Just as a statement of fact, it really would be *easier* (not better, but
> easier) to embrace UTF8 directly since there is nothing that prevents UTF8
> from being used in queries today except the hostname limitations. 

this would be true only if 

a) existing DNS servers could reliably compare UTF8 IDNs without nameprep
b) DNS queries were the major area of concern

> Just as
> a simple measurement of where implementations are today, UTF8 mostly works
> while ACE does not work at all. 

for DNS servers, perhaps.  for applications that transmit DNS names as
protocol elements, the opposite is closer to the truth.  

> In short, ACE actually makes it easier for bad things to happen, since it
> does not mandate new behavior. 

on the contrary, ACE does mandate new behavior in any application that
needs to do more with an IDN than query its DNS records, pass it to 
another application, or compare one IDN with another.  any application
that inputs or displays IDNs will have to behave differently in order
to support ACE; but any application that inputs or compares IDNs would 
have to behave  differently to support UTF-8 IDNs)

> In the end, going with ACE translates
> directly into more apps trying to use UTF8, not less. This is not a
> compelling (in a good way) feature.

any IDN scheme (whether ACE or not) will cause more leakage of 
non-ASCII names into applications that are expecting ASCII names.
the very fact that we are creating a standard mechanism to do
this, combined with the growng number of non-English speakers on
the net, ensures this.

Keith