[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDNs in email message bodies



Adding a new TES will break things.
If Maynard means to 'convert' (sensitive word) the mail data (i.e. rfc 822)
then it would violate many rules.
However, I agree that ACE should be 'converted' for display, and strictly
for display only.
This is analogous from mail clients detecting mail / url addresses and
hyperlinking it.

wil.

----- Original Message -----
From: "Eric A. Hall" <ehall@ehsco.com>
Cc: "idn working group" <idn@ops.ietf.org>
Sent: Monday, March 26, 2001 6:33 AM
Subject: Re: [idn] IDNs in email message bodies


>
> James Seng wrote:
>
> > What the ppt is focusing on are the issues when IDN names appears on
> > RFC821 SMTP command, RFC822 headers (different depending whether it is
> > From, or Subject) or bodies. How can we work with ACE wrt to
> > encoding+TES used already in the body etc.
>
> Whether an IDN appears in a header or in the body is irrelevant. We can't
> change the data.
>
> First of all, users are likely to use whatever characters are available
> from their charset in the message text, or they may change charsets if
> they have to do additional encodings, but for the most part people will
> use whatever characters (and encodings) they already have. Somebody
> advertising a Norsk domain name will write and encode it with iso-8859-1
> or iso-8859-16 or whatever. Similarly, a Japanese IDN will likely be
> written and encoded in iso-2022-jp (or an alternative, maybe unicode, if
> they need more characters). But for the most part, the domain names from
> email addresses and URLs will be encoded in native characters. Users will
> not be converting IDNs to ACE and then writing ACE strings into email
> messages. Sorry.
>
> Secondarily, it is not possible to rewrite message data since that
> corrupts message signatures (Eudora bug, spell-checking a message after it
> was signed invalidated the signature). That means IDNs can't be screwed
> with, meaning they must be preserved as encoded by the sender.
>
> If we accept both of those preconditions, then we also come to the
> conclusion that there isn't a whole lot of difference we can make here,
> other than to request that the client application (or the resolver) use
> ACE conversion when they are presented with an IDN. If some additional
> decoding is required -- such as converting an iso-2022-jp sequence into
> utf8 first -- then that's something that needs to be pointed out. But we
> can't really expect that users aren't going to type IDNs in their rich
> format when they generate a message which also contains plain text from
> some local charset.
>
> Other areas of concern -- like passing Han IDNs when those characters
> aren't available (all languages are problematic for charset=us-ascii) --
> have the same problem and answer really. We can encourage implementations
> (and users) to default to UTF8, and then leave it up to them to do the
> proper conversion whenever a DNS lookup is issued. If they can't or won't
> type the IDN in its native format then they can type it in as an IDN.
> Chances are they will try to copy-and-paste, and hopefully a mailer or a
> browser's input box will be smart enough to deal with it.
>
> But what we cannot do is create a new encoding syntax and expect that it
> will be used only for domain names. As pointed out above, it won't happen,
> because changing data breaks messages, and because users are only going to
> type in the charset that they have/know.
>
> --
> Eric A. Hall                                        http://www.ehsco.com/
> Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/
>