[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] IDNs in email message bodies
- To: Patrik Fältström <paf@cisco.com>
- Subject: Re: [idn] IDNs in email message bodies
- From: "Eric A. Hall" <ehall@ehsco.com>
- Date: Mon, 26 Mar 2001 11:37:27 -0800
- CC: idn working group <idn@ops.ietf.org>
- Delivery-date: Mon, 26 Mar 2001 11:38:01 -0800
- Envelope-to: idn-data@psg.com
- Organization: EHS Company
> >Should we be answering these questions or should the question be
> >raised in a document and left for the respective WGs to answer?
>
> That is the current plan I have as AD for the area with the most
> protocols.
>
> IAB do though have a workshop on internationalization later in April,
> and I know this more meta/process issue will be discussed.
I think we need to split this issue into its conceptual entities.
First issue is user-generated IDNs: Users are going to *TYPE* email
address and URLs in their preferred charset. Applications and systems
cannot make intelligent conversions to ACE (reading mail in pine via
xterm, we can't expect that xterm is going to automatically convert the
string "www.XXXXX.jp" into ACE [especially if there is no "http://"
prefix]). So on that side we need to reinforce that names must be
converted to UTF-8 before they are fed to nameprep but how they are
handled locally is up to the local system and beyond the scope of any IETF
protocol.
Second issue is machine-generated IDNs, such as those that appear in
message headers, PTR lookups, HELO banners, PATH names, HTTP redirs, and
so forth.
The "default" action for PTR lookups is going to be raw ACE strings.
HELO, PATH, HTTP sequences and other "pushed" names can be ACE, or they
can use an RFC-2047 style encoding syntax. There are three issues:
1) If they use a 2047 encoding, that opens a lot of questions.
For example, we have no way of knowing what charset the
end-viewer will be using, so it is not practical to force one.
Although we could mandate that an ACE-specific 2047 encoding
"must use UTF-8", if the user who is viewing the data is using
some other local encoding, it may not be possible for them to
mix UTF-8 with their local encoding. Network analyzers don't
generally support multiple charsets, for example, and throwing
UTF-8 into that may cause more problems than it resolves.
2) Furthermore, 2047 encoding ("?=") of a domain name will break
many applications and protocols which are written to *ONLY*
use RFC-952 hostnames.
3) The whole motivation for ACE is that it is compatible with
legacy ASCII encodings so that it won't break stuff. Trying to
change this into some kind of forward-compatibility will break
stuff, which goes against the very goals of ACE.
For those reasons I think we HAVE to mandate that machine-generated IDNs
must only provide ACE-encoded names whenever thir hostname is written.
This may be a problem for some applications that get their hostname from a
local system call, so the system calls will likely have to provide an ACE
value and a "rich" value.
If that occurs then it may be possible to amend the above rule with a
modifier of "except where the protocol explicitly specifies that the
identifier is eight-bit aware".
That is really ugly stuff, I agree. But given the constraints that we are
under by trying to maintain strict backwards compatibility (vis-a-vis ACE)
I don't see any other approach.
--
Eric A. Hall http://www.ehsco.com/
Internet Core Protocols http://www.oreilly.com/catalog/coreprot/