[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Dots, and a path to working IDNs



James Seng/Personal writes:
> Incidently for those who cares, on a Chinese/Japanese IME, a dot can
> either be U+3002 or U+002E depending if it is full/half width and in
> Korean IME, a dot can either be U+FF9E or U+002E.

This is part of what I was referring to. Why aren't Keith and Patrik
screaming that we need dotprep in thousands of programs? Don't they care
about millions of confused users who have typed the wrong dot?

Reports from Japan suggest that, in fact, keyboard interfaces already
have adequate support for typing domain names. Users type the good dot,
the canonical dot, the ASCII dot. A domain name with a bad dot simply
doesn't work; the user fixes it.

Maybe users will keep trying to put bad dots into domain names for some
reason. Maybe we _do_ want to put dotprep into thousands of programs.
Okay; we'll do that if and when the need for it is clear. However, in
the meantime, we can still use dots!

We can handle UTF-8 IDNs the same way:

   * People will register good UTF-8 IDNs. Lowercase, ASCII dots, no
     confusing characters. We can distribute name-checking software so
     that DNS administrators can make sure their IDNs are good.

   * It's the responsibility of the keyboard interface to help users
     type good IDNs. Bad IDNs simply won't work.

   * Maybe users will keep putting bad characters into domain names for
     some reason. Maybe we really _do_ want slow nameprep, with
     thousands of programs converting bad IDNs to good IDNs. Okay; we'll
     do that if and when the need for it is clear. However, in the
     meantime, we can still use IDNs!

This way we get useful IDNs as soon as possible. We take advantage of
all the existing UTF-8 support. If it turns out that the user interface
isn't good enough, that we need to suffer the massive costs of upgrading
and redeploying thousands of programs, then we'll do that. The existing
IDNs will continue to work.

There are two obstacles to having these IDNs work instantaneously. One
is that many versions of gethostbyname() reject 8-bit characters. The
other is that sendmail destroys bytes 128-159 (which don't show up in
lowercase European characters in UTF-8, but which do show up in other
characters). Both of these problems should be fixed as soon as possible.

---Dan