[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Thoughts on nameprep
- To: idn@ops.ietf.org
- Subject: Re: [idn] Thoughts on nameprep
- From: "Adam M. Costello" <amc@cs.berkeley.edu>
- Date: Sat, 10 Mar 2001 09:09:53 +0000
- Delivery-date: Sat, 10 Mar 2001 01:10:54 -0800
- Envelope-to: idn-data@psg.com
- User-Agent: Mutt/1.3.15i
Bruce Thomson <bthomson@fm-net.ne.jp> wrote:
> I understand that is the motivation for nameprep to repair "bad" names
> into "good" ones.
Actually, I think the primary motivation for nameprep is to prevent
names that look the same but are actually different (to prevent domain
spoofing).
> The important thing that needs to be in the idn spec is just the legal
> character set.
That's not enough, the normalization is important. Consider, for
example, e with acute accent. There are two ways to represent it: as a
single character, or as an e followed by a combining acute accent. If
domain names are allowed to use either form, then someone could register
a domain with one form, and I could register it with the other form, and
when you follow a link to one or the other, your browser will display
a Unicode domain name, and you won't be able to tell by looking at it
whether you're at my site or the other site.
We can't forbid the combining acute accent, because some languages need
it, and we don't want to forbid the precomposed characters, because
we want to use efficient representations. The remaining option is to
require normalization.
We also need case-folding, because domain names are supposed to be
case-insensitive.
We also need to forbid some characters (I think you already saw that
need when you spoke of a "legal character set").
And that's all there is to nameprep; everything it does can be
attributed to case-folding, normalization, or forbidding characters.
It's actually pretty simple; it just uses large tables.
AMC