[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] UTF-8 as the long-term IDN solution
> Just think about:
> In my e-mail I get a line:
> To: text QUOTED-PRINTABLE-TEXT <USER-NAME-COMPATABLE-TEXT@ACE-NAME.ACE-NAME.com>
>
> To handle this I must parse the line, identify each part that have their
> own encoding, have an decode for each type and decode them.
> If we used just UTF-8, to convert into local character set, you just
> do if for the entire line without having to parse out each part and decode them
> separately.
Actually, even if we use just UTF-8, an MUA still has to parse the field, because:
- MUAs need to be able to read old messages (dating back to the 1980s at least),
and for a sufficiently large customer base those old messages contain a mixture
of ASCII, RFC 2047, and other character sets without encoding.
I have code that attempts to handle ASCII, RFC 2047, ISO-2022-*, UTF-8, and
a default 8-bit character set (if none of the above, assume something like
iso-8859-1) in a message header. It's not terribly difficult, but you do have
to bust up the header field into tokens before displaying it.
- There are other reasons completely unrelated to I18N that you might want to
parse the field. If you're formatting the field for a different display width,
you can make the display more pleasing if you wrap long lines between addresses
rather than at arbitrary token boundaries. Also, it looks *much* better to remove
redundant phrases and comments left there by brain-damaged mailers. To take a
fairly common example:
To: "'moore@cs.utk.edu'" <moore@cs.utk.edu>
displays much better as
To: moore@cs.utk.edu
Finally, if you are composing a "reply to all" it is a good idea to remove
duplicate addresses that appear in (for example) both the From and To fields,
so that they only appear once in the list of reply recipients.
I suspect most MUAs already do have an address parser - they just don't
necessarily use it for display.
Keith