[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Why we can go directly to UTF-8

To: Keith Moore <moore@cs.utk.edu>, idn@ops.ietf.org
Subject: Re: [idn] Why we can go directly to UTF-8
From: Martin Duerst <duerst@w3.org>
Date: Fri, 25 May 2001 10:50:22 +0900
Delivery-date: Thu, 24 May 2001 19:01:56 -0700
Envelope-to: idn-data@psg.com

At 00:09 01/05/24 -0400, Keith Moore wrote:
>It's really quite simple.
>
>If we use UTF-8 names, each component in a signal path that handles
>an IDN has to be upgraded before the application will work with IDNs.

Each component *may* have to be upgraded. Some may not need an
upgrade. For many, 8-bit transparency will do just fine.

>For email, this means every UA, MTA, message store, mail filter,
>mailing list, etc that uses the addresses in the header or
>envelope of a message.  For the web, this means every web browser,
>proxy, cache, and origin server that makes use of domain names
>in the request or response (header or payload).

'every' is too general. It's just those in the relevant paths.
For the average web case, it's just the browser and the server.
Proxies can be changed if necessary.

>For both cases,
>it means that every DNS query library, resolver, cache, and server
>involved in the lookups supports UTF-8 also (unless you believe
>that the existing ones will already support UTF-8 without protocol
>extensions, which is far from a given).  There's little incentive
>to upgrade because so many other components need to be upgraded
>before you can get reliable operation.

I very much doubt your last sentence. Companies that are interested
in being found with an idn will obviously upgrade their DNS and
web servers (if necessary). Users interested in using idns will
upgrade their browsers (if necessary). If people will be using
idns as frequently as we all think, the missing bits will be
filled in quite quickly.

>If we use ASCII compatible names, each component in a signal path
>that handles a domain name can upgrade independently, and things
>will keep working - they just won't display the name as nicely if
>they're not updated.    And only the components that interface with
>users need to be upgraded before the users see a benefit.

This is an end-to-end problem, and the end is the user, not some
system. A system that gives (people like you) the impression that
it works, but displays ASCII garbage, is a total failure.

*Nobody* who has a choice between let's say 'toshiba.co.jp' and
some garbage like 'xyz--ttnhpur83g4prhoaunh3.co.jp' (rather
than something like TOUSIBA.co.jp, imagine upper-case as kanji)
will ever want to use the later. It's completely useless for
humans.

>It's easier to get real IDN support into the various components
>using ASCII compatible names because fewer components need to be
>upgraded.  And the incentives for adoption are greater with ASCII
>names because the benefit of upgrading will be seen sooner.

No, what will happen is that the problems will be seen sooner,
and people will complain. Using UTF-8 will help to make sure
that clients are upgraded before idns are used, and that's
crucial for making sure that things work reasonably well
(for the humans who are the purpose of this exercise, not
only for some machines) from the start.

>Users won't care about whether the applications protocols represent
>IDNs in ACE or UTF-8.  But they will care about whether their
>applications support IDNs.  ACE lets them do so far more quickly.

Users care very much whether they see things in their script or they
see some ACE garbage. A half-way solution is not a solution, and ACE
will expose a half-way solution for a long time. Very ugly.

Of course, most probably to you ASCII garbage looks better than
Arabic, Chinese, Japanese, Korean, Hebrew, or other scripts.
Given that you have spent decades reading English, and maybe
only seconds or minutes looking at the other scripts, this is
nothing to be blamed for, and it's something that is in many
ways very difficult to become aware off, to get atop of it
and be able to judge from an independent viewpoint. I know
these kinds of things very well, because still after learning
and reading Japanese for close to 20 years, if I see an
announcement in both Japanese and English, my eyes move
to the English first even if it's printed very small,
and the Japanese is much bigger. Reading is an activity
that we spend more time on learning and doing than anything
else, and so it's no wonder that it happens all automatically
without us being aware of what happens.

So again, I would like to remind you: If users see ACE garbage,
this is a protocol failure in an end-to-end system, not something
we can gloss over lightly.

Regards,   Martin.

Prev by Date: Re: [idn] time to move
Next by Date: Re: [idn] time to move
Prev by thread: Re: [idn] Listing utf8 (in)capable parts
Next by thread: Re: [idn] Why we can go directly to UTF-8
Index(es):
- Date
- Thread