[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fw: [idn] IDNs in email message bodies




> When someone replies to a message, the MUA extracts the email address
> from the From: field, and hands this address to the MTA (which is
> often built into the MUA), which then separates the domain from the
> local part and does a DNS lookup on the domain.  If the MUA and MTA
> are not IDN-aware, and the address in the From: field is a non-ACE
> IDN, then this lookup will fail.

But this specific scenario won't fail due to ACE, it will fail due to the
mailer not conforming with the standards which were already in place
(ASCII only in domain names). In this case, the original sender (which the
reply is being directed to) should not have specified their address as a
charset-encoded IDN, but instead should have written it as a charset@ASCII
sequence and rejected anything that wasn't @ASCII.

Going forward, it seems that the best way to handle these particular
fields is to promote ACE-encoded domain names to the same level as charset
encoding (although they of course have different mechs).

For example, if a user enters <JPuserid>@<JPdomain.dom> then that should
be encoded as <iso-2022-jp>@<ACE-encoding> on message SEND, with the
mailer doing an ACE conversion on the domain part at the same time as it
does an RFC-2047 conversion on the user part.

Applications that display messages to users should know that they can
reverse the encoded data for display just like they would do with RFC-2047
encoded headers, but they have to generate the right data when they craft
up the headers.

Newer apps which are aware of these rules will have to do this if they
want to present a rich experience to the user. Legacy apps should not
allow users to enter these domain names into the relevant fields, or they
should barf if such a name is encountered on load/send. If they are
accepting illegal domain names that is their problem; they shouldn't be
breaking the rules they know about, which is @ASCII only.

More problematic is extra-config data, such as a hand-coded Reply-To
header field in the message, as this data will bypass the mailer's
boundary checks. Not much to say about that, other than to point out that
it could happen and that it should commonly be treated as message data and
therefore covered by charset header. Any conversion that needs to be
applied to this data must be applied when the data is used to send a
message, and should not converted to ACE on behalf of the user prior to
that point.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/