[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Where will we see bad domain names?



> This is different from the chance that an average user typing
> in an average internationalized domain name will get different
> character codes without nameprep than with nameprep from his
> keyboard software are in most cases extremely minimal, with a
> few very notable exceptions.

I think you grossly underestimate the liklihood of this happening.

Here's one example. Suppose I'm sitting in front of a mail client that uses
ISO-8859-whatever and which is attached to both an Internet and  X.400 mail
system. (Although I wish it were otherwise, such setups are quite common.) I
send a message whose body contains a domain that in turn contains accented
characters which, for whatever reason, takes two paths, one through X.400 and
one not.

Now, while X.400 has support for ISO-8859-whatever as part of generaltext,
support for generaltext is so spotty in practice that it is best avoided. So
many (even most) X.400 systems are configured to fall back to the T.61
character set. Unlike ISO-8559-whatever, T.61 represents accented characters as
composed sequences. (The composition order is the opposite of Unicode, but no
matter.)

So we have one path that converts ISO-8859-whatever to Unicode directly and
another that converts ISO-8859-whatever to T.61 and then to Unicode. But given
that all ISO-8859-1 characters are directly representable in Unicode, the
obvious thing to do is use the composed forms. And given that T.61 combining
accents are all in Unicode as well, the obvious thing to do is use decomposed
forms. The result is that the Unicode sequence depends on the path the message
takes.

Is this a contrived example? Hardly -- the reason it occurred to me is that I
saw it happen. Such cases may be hard to think up in the abstact, but in
practice software tends to do things that even its designers and implementors
could not have imagined.

This is why interoperability is a mandatory thing for us, not an optional
one. When we find a standard where conforming behavior could result in
an interoperability, we call that a bug and we fix the standard, regardless
of whether or not we think the problem is likely to occur.

One final note. The recent flap about UTF-8 security issues and the resulting
prohibition of recognition non-shortest length forms illustrates that name
comparison can have security implications. Given that applications frequently
compare domain names as part of security checks this makes this a potential
security problem as well as an interoperability problem.

				Ned