[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] I-D ACTION:draft-ietf-idn-uri-00.txt



> The general question here is ``What will happen if we start using IDNs
> before we do the software upgrades necessary to make IDNs work?''
> 
> UTF-8 IDNs will cause trouble in this case. The point I'm making is that
> ACE IDNs will cause trouble too.

I doubt that anyone disagrees with you on that, but the presumption is
that ACE IDNs will cause less trouble than UTF-8 IDNs.  

Every UTF-8 IDN (except for IDNs that consist of only ASCII characters)
will cause trouble for a significant number of UAs and MTAs.  

But (assuming careful choice of the ACE) few ACE IDNs are likely to be 
significantly longer than UTF-8 IDNs, and the number of MTAs and UAs
that have trouble with long addresses is significantly less than the 
number of MTAs and UAs that have trouble with  8-bit characters.
Indeed, one of the complaints that has been frequently made about UTF-8 
is that it's a very pessimal encoding for some languages; hence, some 
of the ACE schemes are designed to be more compact than UTF-8.
 
> Maynard Kang writes:
> > If you use UTF-8 (or any 8-bit CES) in e-mail addresses, SMTP servers which
> > adhere strictly to RFC 821 will blow up due to the US-ASCII restriction. If
> > you use ACE, perhaps only your mailing list software will blow up.
> 
> My software doesn't have any trouble. 

You posted an example where you claimed that your software would have 
trouble, because that software generates unusually long local-parts,
leaving very little room for domain names.

You claimed that ACE IDNs would cause those addresses to become even
longer, thus preventing your software from functioning.  But you could
just as easily find examples where UTF-8 IDNs would cause those 
addresses to become longer.   And depending on your choice of ACE
scheme and your choice of IDN, you can find examples to support an
argument that either representation is longer than the other.

It's almost inevitable that representation of strings with a character 
repertoire greater than 2**16 in size will take up more space than a 
representation of strings with a character repertoire less than 2**8 
in size.  Thus, longer names is part of the price of having IDNs,
no matter what encoding scheme is used.  Of course, we do want to find
a compact encoding for IDNs, but the right answer seems to be a tradeoff
between maximizing space efficiency, minimizing implementation complexity, 
and minimizing impact on existing software that expects ASCII.

Keith