[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] San Diego Meeting Notes
- To: idn@ops.ietf.org
- Subject: Re: [idn] San Diego Meeting Notes
- From: "D. J. Bernstein" <djb@cr.yp.to>
- Date: 23 Jan 2001 23:32:43 -0000
- Delivery-date: Tue, 23 Jan 2001 15:34:06 -0800
- Envelope-to: idn-data@psg.com
- Mail-Followup-To: idn@ops.ietf.org
I object to ACE because the initial switch involves massive unnecessary
costs. The costs are discussed in http://cr.yp.to/proto/idn.html. See
below for a transliterated copy of idn.html.
I also object to the idea of specifying a short-term plan without a
long-term plan. How can we rationally evaluate the costs of ACE if we
don't also evaluate the likely costs of a future ACE-to-UTF-8 switch?
I also object to the characterization of ACE as an ``application-only''
solution. Is an MTA an ``application''? How about a DNS server? Unless
we can settle on a clear definition of ``application,'' we shouldn't
even be using the word, let alone making decisions that involve it.
As for nameprep: When are we going to see some working software? I have
a bunch of examples that I'd like to try; it's difficult to evaluate
nameprep without software.
---Dan
D. J. Bernstein
Protocols
Internationalized domain names
We want to let people use domain names like abg.com (``alpha beta
gamma dot com''). They should be able to register the name, set up
computers under the name, connect to those computers by name, set up
web pages under the name, set up links to those web pages, browse
those web pages given the name or a link, send email from an address
under the name, receive email at that address, etc.
The big question is how these domain names will be encoded. This web
page describes two proposals, and tallies the costs of each proposal
for UNIX. (The costs for Windows are of similar types but generally
smaller; Microsoft has supported Unicode for years.)
Important note: An uppercase ABG.com (``Alpha Beta Gamma dot com'') is
guaranteed to cause confusion: when the uppercase Alpha is printed
properly, it looks just like an uppercase A. Some other strings are
also guaranteed to cause confusion. There's a complicated definition
of good names among all Unicode strings. Good names won't be confused
with each other. Registrars won't allow registration of bad names.
Base costs
Users need to be able to see common Unicode characters. The necessary
fonts are available, as is a version of xterm that displays the
characters given UTF-8 input. These are all included with the current
version of XFree86, so they are being deployed as part of regular OS
upgrades.
Users will sometimes need to type strange addresses from business
cards. Finding an unusual character in a huge font display is
difficult, so I expect business cards to provide more information,
such as Unicode numbers in small type. Keyboard interfaces will have
to improve to accept this information. (The ISO standard method is
Shift-Ctrl-222E for character 222E.)
Costs of ACE with slow nameprep
What it means. Domain names are encoded as 7-bit strings in the
following contexts:
* DNS registration forms.
* DNS queries and responses.
* Mail message header fields: From, To, Received, etc.
* POP USER commands. (POP usernames typically include domain names.)
* Various parts of IMAP.
* HTTP fields: Host, etc.
* URLs.
Domain names are encoded as UTF-8 strings in the following contexts:
* The argument to gethostbyname.
* h_name and h_aliases.
* /etc/hosts and many other network configuration files.
* /etc/named.boot and zone files.
* Command lines for ndc, nsupdate, etc.
* BIND log files.
* /service/dnscache/root/servers.
* dnscache log files.
* /service/tinydns/root/data.
* Command lines for add-host, add-ns, etc.
* tinydns log files.
* /etc/resolv.conf, $LOCALDOMAIN, etc.
* Command lines for dig, host, etc.
* Output of dig, host, etc.
* Command lines for telnet, ssh, etc.
* /etc/hosts.allow, .ssh/known_hosts, etc.
* httpd.conf.
* /public/file.
* More HTTP server configuration files.
* lynx.cfg.
* Command line for lynx.
* .fetchmailrc.
* Pine interface: message displays, command line, etc.
* Mutt interface: message displays, command line, etc.
* More mail clients.
A domain name encoded as a UTF-8 string is permitted to be a bad name
if it looks just like a good name. It is interpreted as that good
name.
Making it work. BIND, tinydns, and other DNS servers need to be
upgraded. Domain names in configuration files need to be converted
from possibly bad to good, and from UTF-8 to 7-bit. 7-bit domain names
in queries need to be converted to UTF-8 for logs.
The gethostbyname DNS client library needs to be upgraded. The input
domain name needs to be converted from possibly bad to good, and from
UTF-8 to 7-bit, before it is sent as a DNS query. The output domain
names in h_name and h_aliases need to be converted from 7-bit to
UTF-8.
The tcpclient networking tool needs to be upgraded. The input domain
name needs to be converted from possibly bad to good, and from UTF-8
to 7-bit, before it is sent as a DNS query.
telnetd, sshd, tcpserver, etc. need to be upgraded. Domain names in
configuration files need to be converted from possibly bad to good,
and from UTF-8 to 7-bit. 7-bit domain names need to be converted to
UTF-8 for logs.
Many low-level networking tools need to be upgraded. Domain names in
configuration files need to be converted from possibly bad to good,
and from UTF-8 to 7-bit. 7-bit domain names need to be converted to
UTF-8 for logs.
Pine, Mutt, and other text-mode mail clients need to be upgraded.
7-bit domain names need to be converted to UTF-8 when messages are
displayed. UTF-8 needs to be spaced properly. Addresses in
configuration files (for example, ``flag messages to djb@cr.yp.to'')
need to be converted from possibly bad to good, and from UTF-8 to
7-bit.
fetchmail and other POP clients need to be upgraded. Domain names
embedded in POP usernames need to be converted from possibly bad to
good, and from UTF-8 to 7-bit.
Netscape Mail and other graphical mail clients need to be upgraded.
7-bit domain names need to be displayed as Unicode glyphs. Addresses
in configuration files need to be converted from possibly bad to good,
and from UTF-8 to 7-bit.
Apache, publicfile, and other HTTP servers need to be upgraded. Domain
names in configuration files need to be converted from possibly bad to
good, and from UTF-8 to 7-bit. 7-bit domain names need to be converted
to UTF-8 for logs.
Lynx and other text-mode browsers need to be upgraded. Domain names in
configuration files need to be converted from possibly bad to good,
and from UTF-8 to 7-bit. 7-bit domain names need to be converted to
UTF-8 for internationalized URL displays.
Netscape and other graphical browsers need to be upgraded. 7-bit
domain names need to be displayed as Unicode glyphs. Domain names in
configuration files need to be converted from possibly bad to good,
and from UTF-8 to 7-bit.
Costs of UTF-8 with fast nameprep
What it means. Domain names are encoded as UTF-8 strings in all of the
above contexts.
Bad names are not allowed to appear. (Exception: Users can send bad
names in DNS registration forms; the registrar will send back a
rejection notice showing the closest good name.)
Making it work. Sendmail needs to be upgraded. Current versions
discard bytes \200 through \237 in mail message headers.
The gethostbyname DNS client library needs to be upgraded. Many
current installations, in violation of RFC 2181, reject DNS answers
that contain unusual characters. (However, some versions will work
correctly with options allow_special all or options no-check-names in
/etc/resolv.conf.)
Pine, Mutt, and other text-mode mail clients need to be upgraded.
UTF-8 needs to be spaced properly.
There's one report that an obsolete version of the Netscape mailer
crashes under Solaris when it reads UTF-8 messages. I need verifiable
details.
Are there more programs that need to be upgraded? Let me know.