[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Thoughts on nameprep
Hi Adam,
Your argument that accents make nameprep necessary are
sort of convincing, but I have some more questions. Please
bear with me. :)
> We can't forbid the combining acute accent, because some languages need
> it, and we don't want to forbid the precomposed characters, because
> we want to use efficient representations. The remaining option is to
> require normalization.
>
Isn't this sort of like the question of ligatures in English? I used to type
a lot on keyboards with ff, ffi, and other fun keys, and I miss them. :P
Isn't e-acute-accent just a ligature? If we were to just make ligatures
illegal in domain names, would the screams be all that loud?
Not being from France or other country where these are used, I
wonder whether these characters aren't causing a variety of
other problems already, because you can't look at them and tell
how they were typed. If we are going to allow Unicode in text
files in general, variables in C programs with accents would be
confusing, because you could have two seemingly identical variable
names that are in fact different. Also with user names, passwords,
etc. Why do we have to solve this problem for idns alone?
> We also need case-folding, because domain names are supposed to be
> case-insensitive.
We could also just forbid upper case, except for ASCII, which would
get through on a grandfather clause. No mistypings are likely here.
People expect case to matter. Forbidding is simpler than mapping.
> It's actually pretty simple; it just uses large tables.
Well, this at least is reassuring. As an app writer, if I can be
sure that I just map using one table and forbid with a second one,
I wouldn't worry too much. I would just download your tables. :)
Are you pretty confident that the tables are reasonably usable now?
Also, it seems like there are a lot of table entries to solve what you
said was the main problem: a couple of accent ligatures that have
to be mapped into their decomposed forms (or is it the other way?).
It just seems a bit over-engineered...
Bruce
----- Original Message -----
From: "Adam M. Costello" <amc@cs.berkeley.edu>
To: <idn@ops.ietf.org>
Sent: Saturday, March 10, 2001 6:09 PM
Subject: Re: [idn] Thoughts on nameprep
> Bruce Thomson <bthomson@fm-net.ne.jp> wrote:
>
> > I understand that is the motivation for nameprep to repair "bad" names
> > into "good" ones.
>
> Actually, I think the primary motivation for nameprep is to prevent
> names that look the same but are actually different (to prevent domain
> spoofing).
>
> > The important thing that needs to be in the idn spec is just the legal
> > character set.
>
> That's not enough, the normalization is important. Consider, for
> example, e with acute accent. There are two ways to represent it: as a
> single character, or as an e followed by a combining acute accent. If
> domain names are allowed to use either form, then someone could register
> a domain with one form, and I could register it with the other form, and
> when you follow a link to one or the other, your browser will display
> a Unicode domain name, and you won't be able to tell by looking at it
> whether you're at my site or the other site.
>
> We can't forbid the combining acute accent, because some languages need
> it, and we don't want to forbid the precomposed characters, because
> we want to use efficient representations. The remaining option is to
> require normalization.
>
> We also need case-folding, because domain names are supposed to be
> case-insensitive.
>
> We also need to forbid some characters (I think you already saw that
> need when you spoke of a "legal character set").
>
> And that's all there is to nameprep; everything it does can be
> attributed to case-folding, normalization, or forbidding characters.
> It's actually pretty simple; it just uses large tables.
>
> AMC
>
>