[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



On Fri, Mar 29, 2002 at 12:40:41PM +0900, Bruce Thomson wrote:
> > The question is really why 8/16/32 bit Unicode is better than 5bit (ACE)?
> 
> ACE and UTF-8 are just compression algorithms that squeeze larger
> Unicode characters. ACE is more efficient than UTF, although more
> complex.
> 
> But the claim to fame that UTF-8 has is that it is a standard that
> idn can reference, and it is coming into widespread use elsewhere.
> 
> So moving to UTF-8 long term seems like such an obvious choice
> is surprises me that it even gets debated.

I think the UTF-8 is the way to go forward too, and I actuallt think
this is also IESG policy, viz RFC 2277 and RFC 2130. The UTF-8 RFC 2279 is
the only standards track RFC on charsets for the same reason. Citing
from RFC 2277, the IESG policy on charactyer sets and language:

   "Protocols MUST be able to use the UTF-8 charset, which consists of
   the ISO 10646 coded character set combined with the UTF-8 character
   encoding scheme, as defined in [10646] Annex R (published in
   Amendment 2), for all text."

I dont mind having an ASCII fallback as we also have it in email, but
going into the other encoding forms of ISO 10646 is discouraged by
IESG, as they do not want to see a lot of encodings for the same
character set, with the possible problems of interoperability.
So UCS-2, UCS-4, UTF-16, UTF-16-LE, UTF-16-BE etc are discouraged.

Best regards
Keld Simonsen