[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDNs in email message bodies

To: Patrik Fältström <paf@cisco.com>
Subject: Re: [idn] IDNs in email message bodies
From: "Eric A. Hall" <ehall@ehsco.com>
Date: Mon, 26 Mar 2001 11:37:27 -0800
CC: idn working group <idn@ops.ietf.org>
Delivery-date: Mon, 26 Mar 2001 11:38:01 -0800
Envelope-to: idn-data@psg.com
Organization: EHS Company

> >Should we be answering these questions or should the question be
> >raised in a document and left for the respective WGs to answer?
> 
> That is the current plan I have as AD for the area with the most
> protocols.
> 
> IAB do though have a workshop on internationalization later in April,
> and I know this more meta/process issue will be discussed.

I think we need to split this issue into its conceptual entities.

First issue is user-generated IDNs: Users are going to *TYPE* email
address and URLs in their preferred charset. Applications and systems
cannot make intelligent conversions to ACE (reading mail in pine via
xterm, we can't expect that xterm is going to automatically convert the
string "www.XXXXX.jp" into ACE [especially if there is no "http://";
prefix]). So on that side we need to reinforce that names must be
converted to UTF-8 before they are fed to nameprep but how they are
handled locally is up to the local system and beyond the scope of any IETF
protocol.

Second issue is machine-generated IDNs, such as those that appear in
message headers, PTR lookups, HELO banners, PATH names, HTTP redirs, and
so forth.

The "default" action for PTR lookups is going to be raw ACE strings.

HELO, PATH, HTTP sequences and other "pushed" names can be ACE, or they
can use an RFC-2047 style encoding syntax. There are three issues:

 1) If they use a 2047 encoding, that opens a lot of questions.
    For example, we have no way of knowing what charset the
    end-viewer will be using, so it is not practical to force one.
    Although we could mandate that an ACE-specific 2047 encoding
    "must use UTF-8", if the user who is viewing the data is using
    some other local encoding, it may not be possible for them to
    mix UTF-8 with their local encoding. Network analyzers don't
    generally support multiple charsets, for example, and throwing
    UTF-8 into that may cause more problems than it resolves.

 2) Furthermore, 2047 encoding ("?=") of a domain name will break
    many applications and protocols which are written to *ONLY*
    use RFC-952 hostnames.

 3) The whole motivation for ACE is that it is compatible with
    legacy ASCII encodings so that it won't break stuff. Trying to
    change this into some kind of forward-compatibility will break
    stuff, which goes against the very goals of ACE. 

For those reasons I think we HAVE to mandate that machine-generated IDNs
must only provide ACE-encoded names whenever thir hostname is written.

This may be a problem for some applications that get their hostname from a
local system call, so the system calls will likely have to provide an ACE
value and a "rich" value.

If that occurs then it may be possible to amend the above rule with a
modifier of "except where the protocol explicitly specifies that the
identifier is eight-bit aware".

That is really ugly stuff, I agree. But given the constraints that we are
under by trying to maintain strict backwards compatibility (vis-a-vis ACE)
I don't see any other approach.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/

Prev by Date: Re: [idn] IDNs in email message bodies
Next by Date: Re: [idn] IDNs in email message bodies
Prev by thread: Re: [idn] IDNs in email message bodies
Next by thread: Re: [idn] IDNs in email message bodies
Index(es):
- Date
- Thread