[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Re: Unicode and Security
This discussion is being sent to two mailing lists, the IDN list and the
Unicode list. Some Unicode people don't know about IDNs, so I'll give a
brief introduction. At the end I'll address Klensin's comments.
IDNs are ``internationalized domain names.'' Imagine, for example, a
Greek pi in http://pi.cr.yp.to.
IETF created the IDN working group and the IDN mailing list---allegedly
open to the public, but the group chair sometimes censors objections---
to try to make IDNs work. The group chair has been pushing a specific
proposal, called IDNA; he has recently issued a Last Call for IDNA
despite objections to IDNA from many group participants.
There are three big issues surrounding IDNs:
(1) How should characters be encoded?
The obvious choice is UTF-8. Many existing programs already work
with UTF-8 domain names. See http://pi.cr.yp.to for a discussion
of what works and what has to be fixed.
The IDNA proposal uses a new special-purpose 7-bit character
encoding. The proponents claim that adding this new encoding to a
huge number of programs will be less expensive than fixing the
programs that have trouble with UTF-8.
(2) Should two strings be treated separately as domain names if they
are visually indistinguishable? For example, should someone be
allowed to register aol.com, with o replaced by a Greek omicron?
An answer of ``yes'' would introduce many new errors into common
uses of domain names. Domain names are not hidden inside the
computer; they are displayed for users, so that users can
recognize known names. Visually indistinguishable domain names
would make this function inherently unreliable.
The IDNA proposal ignores this problem. The proponents observe
that we're already faced with digit 0 and capital O, and digit 1
and lowercase l; they then leap to the incredible conclusion that
there's no harm in adding a bunch of new characters whose glyphs
are similar or even identical.
(On the other hand, IDNA prohibits all characters that look like
hyphens, periods, etc.)
The conservative approach, familiar in security contexts, is to
have all new characters prohibited by default. The domain-name
registries will then allow selected characters that have been
carefully reviewed for problems.
(3) Should two strings be treated identically as domain names if
they are visually different but semantically similar? For
example, should uppercase Pi.cr.yp.to, with a Greek Pi, be
treated the same way as lowercase pi.cr.yp.to?
The only existing semantic-similarity rule is that uppercase
ASCII is treated the same way as lowercase ASCII. We're forced to
continue treating uppercase ASCII the same way as lowercase ASCII
for interoperability. But should there be more rules?
There are several reasons to say no. First, new rules would have
to be added to a huge number of programs that handle domain
names. Second, new rules would make issue #2 substantially more
difficult to solve: for example, if Greek lowercase alpha is
treated the same way as uppercase Alpha, then alpha ol.com will
conflict with aol.com, because Alpha OL.COM is visually identical
to AOL.COM. Third, semantic similarity depends on the reader: for
example, lowercase delta and uppercase Delta are semantically
similar in Greece, but not in the United States. Fourth, semantic
similarity is not transitive: for example, I see signs in France
using capital E as a capitalization of e-accent-egu, and signs
using capital E as a capitalization of e without an accent, but
omitting the accent from e-accent-egu would be a misspelling.
The IDNA proposal imposes a global set of uppercase-lowercase
conversions. The proponents claim that, if we don't allow
uppercase, we'll be flooded with complaints from users who tried
typing domain names in uppercase. They don't respond when it is
pointed out that users already handle case-sensitive lowercase
URLs without trouble: blah.html works and BLAH.HTML doesn't.
(It's funny how the IDNA proponents express such concern about
the accuracy of users _typing_ domain names, but completely
ignore the accuracy of users _reading_ domain names. See issue
#2. Has it occurred to them that the domain names typed are, in
almost all cases, copies of domain names previously read?)
Meanwhile, the IDNA proposal ignores other questions of semantic
similarity. There have already been a huge number of complaints
about this from Chinese users. The IDNA proponents say that the
complaints ``don't count.''
The conservative---and cost-effective---approach is to start
without any new rules, and have the registries prohibit
characters that may be affected by new rules. Then new rules can
be safely added later _if_ that turns out to be a good idea. For
example, uppercase non-ASCII letters won't be treated the same
way as lowercase, but they also won't be allowed in
registrations,
Now, back to the current thread. The ``Unicode and Security'' message
explained one way that visually indistinguishable characters can be
exploited by attackers. This isn't a flaw in Unicode; it is a flaw in
careless protocol designs such as IDNA.
John C Klensin writes:
> This is _really_ old news, old enough that some companies have
> mail gateways set up to trap and reject outgoing mail that uses
> spoofed variations on the company's name.
Here Klensin is admitting that IDNA will let attackers breach security.
These gateways are---unless modified in the last few months by an IDNA
proponent to use visual-similarity tables that the IDNA proponents say
don't exist---unaware of IDNA, and therefore unable to stop the attack
described in the ``Unicode and Security'' message.
> The solution to them, along with the rest of the large catalog of ways
> to spoof email, is signed and encrypted mail.
Speaking as a cryptographer: I find this ``cryptography solves all
security problems'' attitude to be astonishingly naive.
The problem here is not message forgery. The computer has accurately
identified the name of the sender. A cryptographic guarantee of
authenticity would do nothing to stop the attack.
The problem is in the design of the name system itself: how names are
assigned and used. The attacker was, as a matter of policy, allowed to
use a name visually indistinguishable from the target name. The name was
displayed on a computer screen. The display was read by the victim, and
used to authorize access.
One long-term solution is to drop all reliance on global name displays
in favor of local name displays (address books) defined entirely by the
user. In the short term, however, recipients will continue to use global
name displays to recognize known senders.
---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago