[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Dots, and a path to working IDNs



> > Incidently for those who cares, on a Chinese/Japanese IME, a dot can
> > either be U+3002 or U+002E depending if it is full/half width and in
> > Korean IME, a dot can either be U+FF9E or U+002E.
> 
> This is part of what I was referring to. Why aren't Keith and Patrik
> screaming that we need dotprep in thousands of programs? Don't they care
> about millions of confused users who have typed the wrong dot?

My guess is that this can be done in nameprep.  Perhaps it's not defined
in the current version of nameprep, but this could be fixed.  I'm not
screaming about this partially because I see us as currently arguing about 
more fundamental architectural issues, and because I have confidence 
that we will be willing to address this issue once we agree on how to
proceed at an architectural level.

That is, I don't expect that every detail of the protocol needs to be
worked out before we settle on an architecture.  

> Reports from Japan suggest that, in fact, keyboard interfaces already
> have adequate support for typing domain names. Users type the good dot,
> the canonical dot, the ASCII dot. A domain name with a bad dot simply
> doesn't work; the user fixes it.

As far as I'm concerned that's a perfectly reasonable solution if it works.
But there are some caveats.  For instance, it is one thing if typing the 
"bad" dots causes the lookup to fail; quite another if typing the "bad" 
dots causes a lookup for a completely different domain; and still another 
if the "bad" dots work differently in different clients.

If we don't specify how the "bad" dots are handled as part of the IDN 
protocol, we are essentially defering this to the registries.  Given their 
history and the competitive environment in which they exist, it's not clear 
whether they will strive to minimize bad effects.   At the same time, some 
of this will be left to the registies no matter what - we clearly cannot 
fix all potential IDN ambiguities in the protocol.

> Maybe users will keep trying to put bad dots into domain names for some
> reason. Maybe we _do_ want to put dotprep into thousands of programs.
> Okay; we'll do that if and when the need for it is clear. 

I think the question (for this and many other issues) is - to what extent 
we can risk requiring hosts or applications or servers to be upgraded 
when we discover some failure in our design?  The flip side of this 
question is - to what extent we can get this right the first time, and 
how long should we delay the standard trying to get it right?  I think that 
nameprep goes a long way toward solving the problem, and that it's a
useful approach, but it's not quite there yet.

> However, in
> the meantime, we can still use dots!
> 
> We can handle UTF-8 IDNs the same way:
> 
>    * People will register good UTF-8 IDNs. Lowercase, ASCII dots, no
>      confusing characters. We can distribute name-checking software so
>      that DNS administrators can make sure their IDNs are good.

if we can precisely define what is "good" beforehand (well enough to 
write the name-checking software), couldn't we just as easily define 
nameprep in such a way that it would either convert "bad" names to 
"good" names (for obvious translations) or reject "bad" names that
couldn't be reliably translated to good ones?

>    * It's the responsibility of the keyboard interface to help users
>      type good IDNs. Bad IDNs simply won't work.

quite honestly, I don't think that this will fly.  it strikes me as far
more difficult to get platforms to create a special input mode for
IDNs, and to get applications to use that input mode, than it does to
get applications to accept IDNs using a normal text interface.
it's especially difficult to retro-fit existing applications to use
new input modes.  and as somebody pointed out, we will also need to 
check names that originate from other sources besides the keyboard.

We could quite reasonably define a name-check API routine that would 
return good/bad for a given domain name, and that would be far easier 
than defining a new input mode.  But this could also just be
done by the nameprep routine.

>    * Maybe users will keep putting bad characters into domain names for
>      some reason. Maybe we really _do_ want slow nameprep, with
>      thousands of programs converting bad IDNs to good IDNs. Okay; we'll
>      do that if and when the need for it is clear. However, in the
>      meantime, we can still use IDNs!

this is going to be a massive upgrade.  we would like to avoid having
millions of programs upgraded only to find out that their IDN support is 
buggy and that (for instance) people can't reliably use certain IDNs 
with certain clients, and that there will be yet another massive upgrade
required to fix this.   good design is generally cheaper than good maintenance.

> This way we get useful IDNs as soon as possible. We take advantage of
> all the existing UTF-8 support. If it turns out that the user interface
> isn't good enough, that we need to suffer the massive costs of upgrading
> and redeploying thousands of programs, then we'll do that. The existing
> IDNs will continue to work.

we already know of instances where the UTF-8 support isn't good enough.
surely you would not have us ignore what we've learned from experience 
with these prototypes?

> There are two obstacles to having these IDNs work instantaneously. One
> is that many versions of gethostbyname() reject 8-bit characters. The
> other is that sendmail destroys bytes 128-159 (which don't show up in
> lowercase European characters in UTF-8, but which do show up in other
> characters). Both of these problems should be fixed as soon as possible.

I strongly disagree with this analysis, and it seems that many others do 
also.  But we've been through this argument before, and I don't see the 
point of rehashing it.

Keith