[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] An experiment with UTF-8 domain names
- To: "D. J. Bernstein" <djb@cr.yp.to>
- Subject: Re: [idn] An experiment with UTF-8 domain names
- From: Keith Moore <moore@cs.utk.edu>
- Date: Fri, 05 Jan 2001 15:21:24 -0500
- cc: idn@ops.ietf.org
- Delivery-date: Fri, 05 Jan 2001 12:30:13 -0800
- Envelope-to: idn-data@psg.com
> > They have to be changed because of the nameprep stage,
>
> Ah, yes, the stage where af.mi1 is converted to af.mil.
>
> I know this will come as a shock to you, but I've never implemented that
> stage. The registries don't accept af.mi1 as an alias for af.mil;
> neither do I.
>
> Users who see af.mil on a business card have to type af.mil, not af.mi1,
> into the computer. If they don't know how to do that, too bad.
that's not the problem. the problem is that there are multiple, legal,
on-the-wire representations for af.mil which produce the same glyphs,
and different systems will generate different representations when
the same keys are pressed.
> Keith Moore writes:
> > you're testing the wrong things.
>
> Setting up domain names in DNS, arranging for them to receive mail,
> sending mail to them, and reading the mail, are the ``wrong things''?
yes. you're testing the things which were most likely to work, and
ignoring the things which are least likely to work. there's nothing
wrong with testing everything, but claims of victory that are based
on partial success of the tests that are chosen to be most likely
to work are meaningless...sort of like testing the resiliance of
O rings in warm temperatures, observing that they fail some of the
time, and then expecting them to work just as well when it's cold.
> Do you also think that putting up web pages and reading them are the
> ``wrong things''?
yes. try putting up web pages with links to URLs containing IDNs
which are encoded in UTF-8 (using various URL prefixes and various
protocols with servers on a variety of platforms) and seeing whether
those links work in various browsers. Then try putting URLs
containing IDNs into text files, mailing them around, and using
cut-and-paste to enter them into a broswer's "get URL" dialog box.
Then try printing URLs containing IDNs on business cards, and typing
them in to browsers dialog boxes.
now devise similar tests for all other applications that use DNS
names.
of course ACE approaches will fail some of these tests also, but
the tests have to be realistic if you want meaningful results.
> > we also know that the vast majority of applications cannot properly
> > input or display UTF-8 strings, and that some applications will either
> > break or improperly handle UTF-8 strings when they appear in domain names.
>
> Several of us keep asking for details of what exactly will break. The
> anti-UTF-8 people keep refusing to provide those details. Why? Could it
> be because the failures aren't as widespread as they claim?
even with a small sample size, when all of your trials produce
failures, there's a good chance that it won't work on a large scale.
and especially when the cost of failure is a serious disruption of
critical services, it makes sense to be conservative.
(and your trial produced a failure, you just refuse to recognize it)
> Yes, sendmail corrupts UTF-8 IDNs. Perhaps other programs have trouble
> too. How difficult will it be to fix them? Why should we believe that
> painful MIME-style IDNs will be less expensive?
the real cost of upgrades isn't the effort to fix the code for
current releases, it's:
- identifying the nature of the problems caused and tracing them
to their source
- getting millions of individuals to upgrade their software
- supporting older systems and old releases of software that are still
in use [*]
- dealing with the disruption until everything gets upgraded
[*] upgrades tend to have a domino effect. to upgrade an application you
might well need a newer version of the OS, which might in turn require
more memory, which might justify the cost of entirely new hardware, which
then requires that all of the other tools on that system be re-installed,
which tends to imply that other apps get updated also... so the barrier
to upgrading even a single component can be significant
a strategy for minimal disruption is:
- affect as few components as possible (since the effort required to
deal with breakage is on a per-component basis rather than a
per-line-of-code basis)
- put most of the burden of upgrading on those who benefit most
it appears that ACE schemes fare better at this strategy than UTF-8 schemes.
> All I'm asking for is an accurate evaluation of the costs and benefits
> of each IDN proposal.
I'm sure the list would appreciate it if you did more experiments
and reported the results, as long as you were looking at realistic
scenarios. but even with such results about which cases work and
which ones fail, "costs and benefits" are highly subjective.
bottom line is that each WG participant will have to evaluate the
costs and benefits on his or her own, and each participant will
have to invest the amount of effort that he or she believes is
sufficient to reach a conclusion.
we have to reach rough consensus on the design decisions, we don't
have to reach rough consensus on the motivations.
> > most UNIX users today are not using UTF-8 as a local charset.
>
> XFree86 4.0 uses the UTF-8 version of xterm. It's easy to configure
> Emacs and vim and less to treat text as UTF-8. This is going to replace
> 8859-1 as the default; UNIX users are tired of dealing with deficient
> character sets.
>
> Is this upgrade free? No. People will be stuck with 8859-1 text files
> for years, and will have to go through extra effort to view them until
> they're converted to UTF-8 on disk.
I think it's the other way around. people will not give up their
favorite tools en masse in favor of unfamiliar tools that support UTF-8,
especially when their existing files are in other formats. And changing
to a new xterm won't automatically make the old tools work (actually
it will probably break some tools that expect each character takes one
octet).
and this process won't be driven by UNIX users anyway.
> But choosing another IDN proposal isn't going to reduce this cost.
you haven't said anything to support this argument.
Keith
> > illegal message header
>
> Irrelevant. The format was extended. Calling a successful experiment
> unsuccessful because it wasn't IETF-sanctioned is silly.
perhaps. but calling an experiement successful when the proferred
solution (a) didn't get deployed and (b) didn't solve the problem,
is even sillier. fortuantely, your ersatz header doesn't actually
break things that ignore it - unlike MUAs that generate UTF-8 in
message headers.