[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] An experiment with UTF-8 domain names



At 01/01/05 15:21 -0500, Keith Moore wrote:
> > Do you also think that putting up web pages and reading them are the
> > ``wrong things''?
>
>yes. try putting up web pages with links to URLs containing IDNs
>which are encoded in UTF-8 (using various URL prefixes and various
>protocols with servers on a variety of platforms) and seeing whether
>those links work in various browsers.  Then try putting URLs
>containing IDNs into text files, mailing them around, and using
>cut-and-paste to enter them into a broswer's "get URL" dialog box.
>Then try printing URLs containing IDNs on business cards, and typing
>them in to browsers dialog boxes.

Keith, are you assuming that we are doing all this work so
that people can put 'ace--blah' into their URIs, mail them
around, put them in their 'location:' fields, and so on?
It's clear that that would work with ACE, but that's not
the point of our work, is it? If that's an argument for
using ACE instead of UTF-8, then that's the wrong one.

Please note that mailing truely internationalized URIs
or domain names around does work quite nicely now, there
is absolutely no need for any standards upgrades, although
some implementations still leave room for improvement.
The same applies for printing and typing. The only
point where work is needed is where the domain name or
URI is taken as characters (independent of how they are
encoded, just the way the user sees them on the screen)
and then resolved.



>a strategy for minimal disruption is:
>
>- affect as few components as possible  (since the effort required to
>   deal with breakage is on a per-component basis rather than a
>   per-line-of-code basis)

If it's by component, I doubt that ACE is better than UTF-8.
ACE needs a lot of special considerations, scripts that have
to be fixed, and so on, in order to work.


>- put most of the burden of upgrading on those who benefit most

I'm not sure I agree with this. First of all, it implies that
internationalized domain names have to be usable everywhere,
immediately. I don't think that's really the demand we are
facing.


> > > most UNIX users today are not using UTF-8 as a local charset.
> >
> > XFree86 4.0 uses the UTF-8 version of xterm. It's easy to configure
> > Emacs and vim and less to treat text as UTF-8. This is going to replace
> > 8859-1 as the default; UNIX users are tired of dealing with deficient
> > character sets.
> >
> > Is this upgrade free? No. People will be stuck with 8859-1 text files
> > for years, and will have to go through extra effort to view them until
> > they're converted to UTF-8 on disk.
>
>I think it's the other way around.  people will not give up their
>favorite tools en masse in favor of unfamiliar tools that support UTF-8,

They don't have to. On unix, the general thing you will have to
do is to create aliases or wrapper shell scripts for your editor
to either set the locale to work with UTF-8 or to convert the
file to the encoding your editor can handle and back.

Do you expect editors that can handle ACE transparently to become prominent?
Do you think it is sufficient to handle ACE as ASCII?


>especially when their existing files are in other formats.  And changing
>to a new xterm won't automatically make the old tools work (actually
>it will probably break some tools that expect each character takes one
>octet).

First, UTF-8 is *extremely* well designed to work in most cases even
for such tools (shell scripts, filter programs,...). If you can give
a concrete example of such a breakage in such kinds of programs,
I would appreciate it.


Regards,   Martin.