[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Where will we see bad domain names?
- To: "D. J. Bernstein" <djb@cr.yp.to>, idn@ops.ietf.org
- Subject: Re: [idn] Where will we see bad domain names?
- From: "Martin J. Duerst" <duerst@w3.org>
- Date: Tue, 09 Jan 2001 21:08:51 +0900
- Delivery-date: Tue, 09 Jan 2001 08:52:42 -0800
- Envelope-to: idn-data@psg.com
At 01/01/07 22:29 +0000, D. J. Bernstein c/o James Seng wrote:
>Why do we need name preparation in applications?
>
>I'm not saying the nameprep work is useless. We should provide a program
>to detect bad names: names with confusing characters, or with uppercase
>characters, or that aren't KC-normalized. Registries will then prohibit
>bad names.
>
>But what will go wrong if these rules are hidden from applications? Why
>should a typical application worry about bad names? Good names will be
>accurately copied by cut-and-paste. When will bad names show up?
>
>Mark Davis writes:
> > The chances that the average application would, by chance, duplicate
> > the same normalization is minimal.
I think what Mark meant is that the chance that an average implementation
trying to do something in this area as virtually zero chance to by
chance get everything the same as another one or a well-defined standard.
I agree with this.
This is different from the chance that an average user typing
in an average internationalized domain name will get different
character codes without nameprep than with nameprep from his
keyboard software are in most cases extremely minimal, with a
few very notable exceptions.
>You're saying that a user will see a good name on paper, type it into
>his computer in his favorite way, and end up with a bad name that looks
>the same.
>
>Can you please give some real examples? What were the good names? What
>keyboard-interface software was involved? What did the user type? Why
>didn't the user end up with good names? Was it a bug in the software?
The classical case is half-width and full-width alphabetics and
Katakana in Japan. It's not a bug in the software, but a configuration
option. Many people are not aware of the fact that full-width
Latin letters are different character codes than half-width (ASCII),
and their keyboard may come set to full-width, or they may set it
that way.
>Perhaps there's a need for a tool that fixes bad Chinese domain names.
There are quite some things that may go wrong with Chinese domain
names, but there are so many characters, with so many subtly or totally
different relations, that it is impossible to use a uniform (e.g.
table-based) approach. The solution here is to register more than
one form if it's needed; the number necessary is usually very
small (not as in case folding).
>---Dan
>
>P.S. RFC 1034 says ``When you receive a domain name or label, you should
>preserve its case.'' Are we going to scrap this part of RFC 1034, and
>encourage clients to fold case for ASCII characters? Or are we going to
>say that the case of ASCII characters should be preserved?
>
>Of course, for interoperability, DNS software and mail software and so
>on will continue _comparing_ ASCII characters without regard to case.
The idea is to:
- Have different case versions behave as equivalents.
- Move case-folding away from the servers because of performance
and updating issues (and potentially because of cultural differences
that cannot be handled on the server side).
Whether ASCII case folding is treated as up to now or will be
integrated with the rest isn't clear to me yet.
Regards, Martin.