[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] UTF-8 as the long-term IDN solution

To: Dan Oscarsson <Dan.Oscarsson@trab.se>
Subject: Re: [idn] UTF-8 as the long-term IDN solution
From: Keith Moore <moore@cs.utk.edu>
Date: Tue, 29 May 2001 09:15:54 -0400
cc: djb@cr.yp.to, idn@ops.ietf.org, James@Seng.cc
Delivery-date: Tue, 29 May 2001 06:18:28 -0700
Envelope-to: idn-data@psg.com

> Just think about:
> In my e-mail I get a line:
> To: text QUOTED-PRINTABLE-TEXT <USER-NAME-COMPATABLE-TEXT@ACE-NAME.ACE-NAME.com>
> 
> To handle this I must parse the line, identify each part that have their
> own encoding, have an decode for each type and decode them.
> If we used just UTF-8, to convert into local character set, you just
> do if for the entire line without having to parse out each part and decode them
> separately.

Actually, even if we use just UTF-8, an MUA still has to parse the field, because:

- MUAs need to be able to read old messages (dating back to the 1980s at least),
  and for a sufficiently large customer base those old messages contain a mixture 
  of ASCII, RFC 2047, and other character sets without encoding. 

  I have code that attempts to handle ASCII, RFC 2047, ISO-2022-*, UTF-8, and
  a default 8-bit character set (if none of the above, assume something like 
  iso-8859-1) in a message header.  It's not terribly difficult, but you do have
  to bust up the header field into tokens before displaying it.

- There are other reasons completely unrelated to I18N that you might want to 
  parse the field.  If you're formatting the field for a different display width, 
  you can make the display more pleasing if you wrap long lines between addresses
  rather than at arbitrary token boundaries.  Also, it looks *much* better to remove
  redundant phrases and comments left there by brain-damaged mailers.  To take a 
  fairly common example:

  	To: "'moore@cs.utk.edu'" <moore@cs.utk.edu>

  displays much better as 

  	To: moore@cs.utk.edu

  Finally, if you are composing a "reply to all" it is a good idea to remove
  duplicate addresses that appear in (for example) both the From and To fields,
  so that they only appear once in the list of reply recipients.

  I suspect most MUAs already do have an address parser - they just don't 
  necessarily use it for display.
 
Keith

Prev by Date: [idn] draft-ietf-idn-step
Next by Date: Re: [idn] UTF-8 as the long-term IDN solution
Prev by thread: Re: [idn] UTF-8 as the long-term IDN solution
Next by thread: Re: [idn] UTF-8 as the long-term IDN solution
Index(es):
- Date
- Thread