On Fri, 22 Mar 2002 00:30:37 GMT, "D. J. Bernstein" said: > Boy, I'm glad that, when you faced the much smaller problem of non-ASCII > subject lines back in 1991, you and your buddies decided to ``maximize > the rate'' of deployment by inventing your own encoding mechanisms, > rather than giving in to the demands of 8-bit transparency. Oh, sure, we The MIME and ESMTP working groups had *quite* a bit of infighting on that topic. There was a large contingent that wanted to declare "just send 8 bit" the standard. Re-reading RFC1428 and realizing that at the time, there were a *LARGE* number of systems that broke badly if they were handed 8 bit data. You can't blame Keith for that one - there were a LARGE number of people involved. And there was a basic non-negotiable design goal for MIME that had to be followed: A MESSAGE HAD TO GET THERE INTACT EVEN IF IT PASSED THROUGH A NON-MIME-AWARE SYSTEM. This is in all-caps because it's important. For instance, RFC2047-style header encoding happened because you could *NOT* trust that all the systems between here and there were 8-bit-clean (in fact, an 8-bit-clean system was a rarity) - and if the third or fourth system that the mail passed through happened to be non-MIME, it might do something nasty (like fold characters in 128-255 down into the lower half, or bounce the message because it was invalid RFC822 because RFC822 *clearly* says ASCII, and non-ASCII is therefor illegal). Let's take as an example the "native language" encoding of my name: From: Valdis Kl=?iso8859-4?Q?=BA?=tnieks <Valdis.Kletnieks@vt.edu> (That's a "small e with macron", Unicode 0113). If you have a *better* suggestion than 2047-encoding of how to pass that character in an e-mail header *that will survive passing through an intermediary system that enforces strict RFC822*, please clue us in.... Why is this important? Well.. think about this message. I may have an 8-bit-clean mailer. You may have one. But neither of us has the authority to make sure that the software at ietf.org is able to deal. Those who think the DNS is being overly strict in enforcing ASCII in domain names are invited to consider the following: 1) RFC1035 says this in section 3.1: Although labels can contain any 8 bit values in octets that make up a label, it is strongly recommended that labels follow the preferred syntax described elsewhere in this memo, which is compatible with existing host naming conventions. Name servers and resolvers must compare labels in a case-insensitive manner (i.e., A=a), assuming ASCII with zero parity. Non-alphabetic codes must match exactly. RFC2181, section 11, re-iterates that the DNS is codeset-agnostic. So what do you know... the DNS spec is already using an N-bit-clean length-data encoding. So obviously there's some Very Evil Restriction that makes it not allow non-ASCII. And the favorite whipping boy here is BIND, which by default restricts it (you can put 'check-names off;' for a zone in the named.conf file, and you can put 'options no-check-names' in your resolv.conf). 2) Why does it get restricted? Consider the parsing issues involved if you have a domain name that uses raw Unicode and embeds the character known as "Malayalam Letter UU". A hint why this is Very Bad are available at http://www.unicode.org/charts/PDF/U0D00.pdf And of course, the true bugbear here is that you can create a domain name that contains Malayalam Letter UU, and it will work Just Fine with all the Neat New Software you've installed. However, you have to deal with the reality that 90% of the users are going to try to contact your site using either a web browser or an email client shipped by one vendor - and that vendor hasn't shipped a version compatible with Your New Neat Stuff. So if you use some New Method that breaks that software, you're not going to be talking to many people.... Hmm... maybe "backwards compatability" and "dont break existing users" *are* important after all..... -- Valdis Kletnieks Computer Systems Senior Engineer Virginia Tech
Attachment:
pgp00003.pgp
Description: PGP signature