[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: 7 bits forever!

To: "D. J. Bernstein" <djb@cr.yp.to>
Subject: [idn] Re: 7 bits forever!
From: Valdis.Kletnieks@vt.edu
Date: Fri, 22 Mar 2002 09:44:37 -0500
Cc: idn@ops.ietf.org, ietf@ietf.org, iesg@ietf.org, iab@isi.edu
In-reply-to: Your message of "Fri, 22 Mar 2002 00:30:37 GMT." <20020322003037.31068.qmail@cr.yp.to>
References: <020901c1d083$74b55aa0$ec333a41@EDMON15> <200203212232.g2LMW6t17451@astro.cs.utk.edu> <20020322003037.31068.qmail@cr.yp.to>

On Fri, 22 Mar 2002 00:30:37 GMT, "D. J. Bernstein" said:

> Boy, I'm glad that, when you faced the much smaller problem of non-ASCII
> subject lines back in 1991, you and your buddies decided to ``maximize
> the rate'' of deployment by inventing your own encoding mechanisms,
> rather than giving in to the demands of 8-bit transparency. Oh, sure, we

The MIME and ESMTP working groups had *quite* a bit of infighting on
that topic.  There was a large contingent that wanted to declare "just
send 8 bit" the standard. Re-reading RFC1428 and realizing that at the
time, there were a *LARGE* number of systems that broke badly if they
were handed 8 bit data.

You can't blame Keith for that one - there were a LARGE number of people
involved.  And there was a basic non-negotiable design goal for MIME that
had to be followed:

A MESSAGE HAD TO GET THERE INTACT EVEN IF IT PASSED THROUGH A NON-MIME-AWARE
SYSTEM.

This is in all-caps because it's important.  For instance, RFC2047-style
header encoding happened because you could *NOT* trust that all the systems
between here and there were 8-bit-clean (in fact, an 8-bit-clean system
was a rarity) - and if the third or fourth system that the mail passed
through happened to be non-MIME, it might do something nasty (like fold
characters in 128-255 down into the lower half, or bounce the message because
it was invalid RFC822 because RFC822 *clearly* says ASCII, and non-ASCII
is therefor illegal).

Let's take as an example the "native language" encoding of my name:

From: Valdis Kl=?iso8859-4?Q?=BA?=tnieks <Valdis.Kletnieks@vt.edu>

(That's a "small e with macron", Unicode 0113).

If you have a *better* suggestion than 2047-encoding of how to pass that
character in an e-mail header *that will survive passing through an
intermediary system that enforces strict RFC822*, please clue us in....

Why is this important?  Well.. think about this message.  I may have
an 8-bit-clean mailer.  You may have one.  But neither of us has the
authority to make sure that the software at ietf.org is able to deal.

Those who think the DNS is being overly strict in enforcing ASCII in
domain names are invited to consider the following:

1) RFC1035 says this in section 3.1:

  Although labels can contain any 8 bit values in octets that make up a
  label, it is strongly recommended that labels follow the preferred
  syntax described elsewhere in this memo, which is compatible with
  existing host naming conventions.  Name servers and resolvers must
  compare labels in a case-insensitive manner (i.e., A=a), assuming ASCII
  with zero parity.  Non-alphabetic codes must match exactly.

RFC2181, section 11, re-iterates that the DNS is codeset-agnostic.

So what do you know... the DNS spec is already using an N-bit-clean
length-data encoding.  So obviously there's some Very Evil Restriction
that makes it not allow non-ASCII.  And the favorite whipping boy here
is BIND, which by default restricts it (you can put 'check-names off;'
for a zone in the named.conf file, and you can put 'options no-check-names'
in your resolv.conf).

2) Why does it get restricted?  Consider the parsing issues involved if you
have a domain name that uses raw Unicode and embeds the character
known as "Malayalam Letter UU". A hint why this is Very Bad are available
at http://www.unicode.org/charts/PDF/U0D00.pdf

And of course, the true bugbear here is that you can create a domain name
that contains Malayalam Letter UU, and it will work Just Fine with all
the Neat New Software you've installed.

However, you have to deal with the reality that 90% of the users are
going to try to contact your site using either a web browser or an email
client shipped by one vendor - and that vendor hasn't shipped a version
compatible with Your New Neat Stuff.  So if you use some New Method that
breaks that software, you're not going to be talking to many people....

Hmm... maybe "backwards compatability" and "dont break existing users"
*are* important after all.....

-- 
				Valdis Kletnieks
				Computer Systems Senior Engineer
				Virginia Tech

Attachment: pgp00003.pgp
Description: PGP signature

Follow-Ups:
- Re: [idn] Re: 7 bits forever!
  - From: David Hopwood <david.hopwood@zetnet.co.uk>
- RE: [idn] Re: 7 bits forever!
  - From: "Kent Karlsson" <kentk@md.chalmers.se>
- Re: [idn] Re: 7 bits forever!
  - From: "D. J. Bernstein" <djb@cr.yp.to>
- [idn] Re: 7 bits forever!
  - From: "Eric A. Hall" <ehall@ehsco.com>

References:
- [idn] Moving Towards UTF8 vs ASCII(ACE) Forever
  - From: "Edmon Chung" <edmon@neteka.com>
- [idn] Re: Moving Towards UTF8 vs ASCII(ACE) Forever
  - From: Keith Moore <moore@cs.utk.edu>
- [idn] 7 bits forever!
  - From: "D. J. Bernstein" <djb@cr.yp.to>

Prev by Date: Re: [idn] URL encoding in html page
Next by Date: Re: [idn] URL encoding in html page
Previous by thread: [idn] 7 bits forever!
Next by thread: [idn] Re: 7 bits forever!
Index(es):
- Date
- Thread