[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] An experiment with UTF-8 domain names



I decided to create a domain named S.cr.yp.to, but with the S changed to
a UTF-8 contour-integral sign. That's Unicode #222E; \342\210\256.

(Some readers say that their MUAs can't display UTF-8 text correctly, so
I've changed contour-integral back to S in the following text.)

I'm using UNIX, with the UTF-8 version of xterm. Here's what I did:

   # cd /etc/tinydns/root
   # echo @S.cr.yp.to:131.193.178.181:mx.contour.cr.yp.to >> data
   # make

This created S.cr.yp.to MX mx.contour.cr.yp.to A 131.193.178.181.

   # dnsmx S.cr.yp.to
   0 mx.contour.cr.yp.to

This did a DNS lookup through my local cache, dnscache, and obtained the
result without trouble. Can your cache do this? If not, why not?

   # cd /var/qmail/control/virtualdomains
   # echo S.cr.yp.to:contour >> virtualdomains
   # echo S.cr.yp.to >> rcpthosts
   # svc -h /service/qmail

This arranged for qmail to accept mail for (e.g.) postmaster@S.cr.yp.to
and deliver it to contour-postmaster on the local host.

   # ( echo To: postmaster@S.cr.yp.to; echo Testing ) | qmail-inject

This sent a message to postmaster@S.cr.yp.to. I read the message using
the UTF-8 version of less, and it was displayed comprehensibly, with the
To line shown as follows:

   To: postmaster@"S".cr.yp.to

qmail-inject doesn't mind weird characters, such as control characters
and 8-bit characters, in atoms. It converts the atoms to quoted strings.
Of course, RFC 822 doesn't allow quoted strings, never mind 8-bit
characters, in domain names, but these are easy protocol extensions.

I then tried sending a message to postmaster@S.cr.yp.to from another
machine. My SMTP client, qmail-remote, quoted the S the same way that
qmail-inject did. The message was received and read without trouble.

What's wrong with handling S this way? The answer seems to be that some
other programs don't work. What are those programs? What exactly do they
do wrong? How hard is it to fix them? Why should we believe that the
other IDN proposals will require less effort?

Keith Moore writes:
> all apps will have to be modified (many will require significant
> modification) if they want to deal meaningfully with IDNs.

False. As demonstrated above, qmail and djbdns already work with UTF-8
domain names. They're both widely deployed. Apparently Microsoft also
has some clients and servers that work with UTF-8 domain names.

Changing those programs has a cost. What is the benefit?

Brian W. Spolarich writes:
> Using 8-bit data will break some applications.
> Using 7-bit data (presumably) will not.

False. If the user's S.cr.yp.to has to be encoded inside DNS and mail
messages as ace-blah.cr.yp.to, then qmail will be faced with S.cr.yp.to
in (e.g.) /var/qmail/control/virtualdomains, and ace-blah.cr.yp.to in
SMTP. This simply won't work unless the software is changed.

If, on the other hand, S.cr.yp.to is used as is, then the software will
work fine.

James Seng/Personal writes:
> Patching sendmail might be trival for a good programmer like yourself.
> How fast do you think you can get everyone to use your patch and would
> unpatch software fallback safely?

All the proposals require a sendmail patch. To tell sendmail to accept
mail for S.foo.dom, the user adds S.foo.dom to a file with his UTF-8
editor; sendmail mishandles the \210 if it isn't patched.

The patch required for direct use of UTF-8 is by far the simplest. No,
deployment isn't free, but the other proposals don't change this fact.

---Dan

P.S. I'm a subscriber to this mailing list. I don't want to receive
extra copies of messages sent to the list. I've set Mail-Followup-To
accordingly.

P.P.S. You may have noticed an unusual From line on this message. The
problem is that the software running this mailing list can't deal with
the concept of sublist subscribers; it forwards my messages to Seng.
Seng eventually approved two of my messages, editing the Date field and
removing the Received lines to hide the delay. He has refused to approve
this one unless I take special actions to fool the list software. So I'm
using his address in From, and my address in Reply-To.