[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Back to work (Nameprep)



This message contains responses to Martin Duerst, Mark Davis, and Keith
Moore.

Martin Duerst <duerst@w3.org> wrote:

> But if [nameprep] is supposed to be strictly applied on every
> occasion, in particular every time a name is resolved, as e.g. Patrick
> is describing it, these foldings may lead to people believing that
> these characters are acceptable in a domain name the same way an
> upper-case character is.

What's wrong with actually considering them to be acceptable in the same
way as uppercase characters?

> 1) Characters that are very clearly visually distinct from the ones
>    they are mapped to. (for obvious reasons)

The reason is not obvious to me.  B and b are very clearly visually
distinct, yet we live happily in a world where both berkeley.edu and
Berkeley.EDU are commonly used, and are equivalent.

> 2) Characters that completely map to ASCII-only characters (to make
>    sure that current applications and applications doing nameprep
>    behave the same way for ASCII-only).

I don't see the problem, and you are apparently proposing to eliminate
the mapping from full-width Latin to half-width Latin, which I think
would be unfortunate.

> An example of a character for which both of the above apply is
> U+2460 CIRCLED DIGIT ONE. This is a simple digit 'one' in a circle.
> Obviously, everybody can immediately see that it's different from
> just a '1'. Also, if it's mapped to '1' by nameprep applications,
> these applications will behave differently from applications not using
> nameprep (i.e. everything out there now).

Well of course, if you try to compare names without doing nameprep, then
you will fail to see matches where you should.  Of course feeding a
non-ASCII hostname to an IDN-unaware application is asking for trouble.

If you decide to write the domain name foo1.com using a circled 1, then
obviously you are using IDN functionality and you assume the same risk
as anyone using any IDN: the risk that someone will try to paste it into
an IDN-unaware application.

Mark Davis <mark@macchiato.com> wrote:

> I think the whole notion of trying to prevent cross-script confusions
> in domain names is a morass.
>
> Better would be to have useful GUIs that detect and signal possibly
> confusing names.

I agree.  There's really no way to guarantee that the distinction
between non-equivalent names is apparent to users.  Even with a font
that distinguishes 1 from l and 0 from O, people still overlook
misspellings, even blatant ones like "whitehouse.com" instead of
"whitehouse.gov".

What nameprep can do, however, is help make sure that users are able
to type the names they have in mind.  So a user who wants to type
<omicron><omicron> can do so, and a user who wants to type oo (Latin)
can do so (regardless of whether their keyboard interface defaults to
full-width or half-width).  A user who configures their browser to
accept cookies from oo.com (Latin) will not accidentally get cookies
from the Greek look-alike.

> If it is impossible to register names *except those that have been
> nameprepped*, then whether or not UTF-8 names go over the wire
> unprepped or not should not cause much of a security problem.

Here's another way to think of it:  Whenever I use any domain name
for any purpose, I am trusting the name servers of that zone (and
all its ancestor zones) to give out true information when that name
is looked up.  With IDNs, I must now also trust those servers not to
give out bogus information in response to non-nameprep'd equivalent
representations of that name, but my exposure has not increased--I was
completely at the mercy of those servers before, and I still am.  So
as you say, allowing non-nameprep'd UTF-8 on the wire does not cause a
security problem.

Keith Moore <moore@cs.utk.edu> wrote:

> fuzzy matching on the server side.

Fuzzy matching at lookup time (as opposed to registration time) would
not help with the problem of domain name spoofing (like tricking someone
to follow a link to yah<omicron><omicron>.com).  It would help the
problem of someone seeing oo.com in print and not knowing whether
it's Latin or Greek, but that problem is much rarer, usually resolved
by context, and people can choose to register names that are not so
confusing.  I think fuzzy matching in servers would be more trouble than
it's worth.

AMC