[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] What important would UTF-8 break?



Hi Dan,

> What applications will break?
> The most important I can think of is e-mail.
> But it need not break. Many wanting to use non-ASCII will
> probably have an ASCII name for use in e-mail for many years
> more. And the enhanced e-mail applications that those using
> non-ASCII in names will use, will handle the conversion
> from UTF-8 to ASCII.

Wouldn't the conversion from UTF-8 to ASCII result in a transfer
encoding
syntax, that is analagous to ACE in the first place? That is in all
essence
no different from the existing proposals (i.e. IDNA) we have seen so
far,
where ACE is used as transport and representation and anything else can
be
used for the presentation layer (hell, even UTF-8 as a "pseudo-encoding"
for
the presentation layer suffices).

> Have anybody done any real tests to see what will work and
> what will not? Remember that it is those wanting to use
> non-ASCII in names who also quickly will fix their software
> to make it work. You can start to introduce UTF-8 names
> where it will not break older applications and use ASCII names
> where they will.

Main problem: When a user types in a non-ASCII e-mail address into a
"non-enhanced" (to use your terminology) e-mail application. The
application
might:

1) be nice enough to disallow that e-mail address with an error message,
or;
2) replace the highest order bit of each byte of the e-mail address with
a
0.
3) attempt to RFC-2047-nize  the non-ASCII e-mail address to a transfer
encoding (QP/B64).
4) leave it as it is and send the message to the SMTP server

and where (4) occurs, the MTA might do the following

5) replace the highest order bit of each byte of the e-mail address with
a
0.
6) convert the e-mail address to a series of question marks (i.e.
???@???.??)
7) bounce the mail

In any case, you get a mangled e-mail address that will never be able to
be
replied to. That is certainly not an acceptable e-mail QoS standard.

By the way, this is gathered from some interoperability testing I have
done
in the past.. for the above mentioned cases, (1) is Outlook Express. (2)
is
most Unix MUAs. (3) is Netscape Mail. (4) is Eudora. (5) is Sendmail.
(6) is
Microsoft Exchange SMTP server. (7) is what you get when the e-mail
address
no longer has a valid route after being mangled

Conclusion: UTF-8 would most certainly cause e-mail to break, short of
having a flag day upgrade for all e-mail related systems on the Internet
simultaneously.

regards,
maynard