[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Dots, and a path to working IDNs

To: "D. J. Bernstein" <djb@cr.yp.to>
Subject: Re: [idn] Dots, and a path to working IDNs
From: Keith Moore <moore@cs.utk.edu>
Date: Tue, 29 May 2001 21:28:26 -0400
cc: idn@ops.ietf.org
Delivery-date: Tue, 29 May 2001 18:28:57 -0700
Envelope-to: idn-data@psg.com

> Keith Moore writes:
> > For instance, it is one thing if typing the "bad" dots causes the
> > lookup to fail; quite another if typing the "bad" dots causes a lookup
> > for a completely different domain; and still another if the "bad" dots
> > work differently in different clients.
> 
> I agree that having two different names, one using a good dot and one
> using a bad dot, would be confusing for users. It would also mean that,
> if we ever decided to move to ``bad dots are converted to good,'' we'd
> break some working names. Disaster!
> 
> The current situation, however, is that the bad dot isn't used. So
> neither problem happens. There's only one name. 

Right.  So one way to prevent this ambiguity is to encourage applications 
to refuse to accept domain names containing "bad" dots.  That way, there
won't be any market for them. (Another way is to encourage DNS servers
to complain when they find bad dots in their zone files.)

> My proposal is to do the same for IDNs. We can allow good UTF-8 IDNs,
> and prohibit bad UTF-8 IDNs. Everything necessary to make this work is
> something that we want to do anyway. Nothing has been lost if it turns
> out that the users also need bad->good conversion.

I think that's right - the crucial question being: what is the mechanism
for "prohibiting" bad IDNs?

(note that this is completely orthogonal to whether UTF-8 or ACE is used)

> > we would like to avoid having millions of programs upgraded only to
> > find out that their IDN support is buggy and that (for instance)
> > people can't reliably use certain IDNs with certain clients, and that
> > there will be yet another massive upgrade
> 
> Aha---you're missing a basic point. My plan is compatible with bad->good
> conversion: bad IDNs will be prohibited on the wire. If a programmer has
> time to do all the work necessary for bad->good conversion, then he can
> do that and deploy it. Nothing in my plan slows him down.

I think the difficulty is ensuring that we've identified all or most
"bad" IDNs before we deploy the code that prohibits them.  Again, we
have this difficulty regardless of the encoding that is used on the wire.

> What I'm trying to do is make IDNs work as soon as possible. 

I think we share that goal, but we favor different strategies for getting 
there.  One difference may be in how we define what it means for IDNs
to "work".  Another difference may be in our assumptions about when and to 
what extent people will be willing to upgrade their existing software.

> We already
> have widespread support for UTF-8 in existing programs. We can take
> advantage of this---as many people already have---to get UTF-8 IDNs.

But there are risks and costs associated with going down this path -
if the results are inconsistent (either from one application to another
or from one operating environment to another) or unpredictable, if they 
won't work well for some important languages, of if they preclude or 
delay deployment of a better solution that does work well for most languages.

> Yes, it's possible that simple UTF-8 IDNs won't be as good as UTF-8 IDNs
> with bad->good conversion in thosuands of programs. But they're clearly
> much better than no IDNs at all. 

I'm not sure that this is "clearly" the case.  When we know that there 
are significant problems with the simple approach for some languages, or
if we're pretty sure that the simple approach won't work well for very
long, it's sort of irresponsible for us to encourage people to follow
that approach - even if a few vendors have (irresponsibly?) shipped code
that assumes that approach.

Put another way, an IETF WG's job is not to endorse quick fixes, but to 
figure out what will work well in the medium to long term.  That doesn't 
mean that vendors cannot deploy short-term fixes, but we believe that
for a solution to be worth the massive deployment effort, it needs to
work well for a long time and for the vast majority of users.  We cannot
accept limitations in an IETF solution that a vendor might accept in a
short-term solution.  If we cannot do better than the short-term approach, 
we  shouldn't be wasting our time - the vendor community doesn't need 
IETF to bless something that already works well.  It's precisely because 
the quick and easy solutions do not work well that we are engaged in this 
effort.

> > it strikes me as far more difficult to get platforms to create a
> > special input mode for IDNs,
> 
> The worst case is exactly what Adam has been proposing for ACE: a
> separate little IDN tool, added on to the OS, that reads input from the
> user and prints a good IDN.

I didn't see Adam's proposal as anything more than a workaround for expert
users of UNIX.  The vast majority of users out there would not use such
a tool regardless of whether it emitted ACE or UTF-8.

> > we already know of instances where the UTF-8 support isn't good enough.
> 
> What are you talking about?

We already know that different systems generate different representations
of the same characters, even when the users try to generate the same 
characters,  and that this isn't fixed by the currently-deployed software 
that just sends UTF-8. 

Keith

Prev by Date: Re: [idn] UTF-8 as the long-term IDN solution
Next by Date: Re: [idn] report of the straw poll
Prev by thread: Re: [idn] Dots, and a path to working IDNs
Next by thread: Re: [idn] Dots, and a path to working IDNs
Index(es):
- Date
- Thread