[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] UTF-8 / RACE



> Furthermore, a computer can't always recognize what is a domain name
> and what is not. I think it's pretty darn ugly that the same domain
> e g in the header of a mail message and in the body of the same
> message should be displayed in two completely different ways.

I agree that it's ugly.  However this is a natural result of the fact
that header information is structured according to one standard
(RFC 822, now RFC 2822), and textual message bodies are encoding
with any of a variety of character encoding standards.

Currently, message headers are specified to be ASCII (with non-ASCII
portions encoded per RFC 2047), though in reality a variety of
character sets are used without labelling.  Textual body parts may be in 
any of several different character encodings, and these are sometimes
labelled correctly.    UTF-8 is allowed as one of the encodings for 
textual message bodies , but it is not widely used.

In order to reduce this "ugliness" of having different encodings
for message header and message body it would be necessary not only
to extend message headers to allow UTF-8 but also for UTF-8 to become
a popular method of encoding textual message bodies.  The latter 
seems unlikely to happen for a great many years, because even if many
user agents support the ability to display UTF-8, the number of user 
agents that can display other common character sets will still be greater.
Thus, for many languages, it will be safer to send text in some other
character set besides UTF-8. 
 
> ACE has it's advantages but the display problem is not a reason to
> choose ACE. I think it's obvious that it should eventually be phased
> out. Once again, in the long perspective we need a common encoding
> like ASCII: The Unified Encoding that is used for every single piece
> of text that is transmitted on the Internet.

I am not sure that we will find that Holy Grail anytime soon.
Even if we adopt UTF-8 we will still have to deal with various ways
of encoding "rich" text.

But this is all beside the point.  The IDN WG cannot legislate the 
encodings that are used by other applications; and it cannot legislate
that existing applications change their behavior.  It can only recommend
how to solve the I18N problem for domain names.  If the solution that
IDN recommends doesn't work well for some applications, it will not get
adopted for those applications - even if the recommended solution gets
approved as a standard.

It's very important to be realistic about what can be acheived.


I support the effort to craft a compromise around the idea of
ACE now - UTF-8 eventually.  But we need to realize that we cannot
mandate UTF-8 for existing applications within any particular
time frame.  

Keith