[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] URL encoding in html page
Bruce Thomson <bthomson@fm-net.ne.jp> wrote:
> But to conserve file space, it would probably be best to allow
> intermixing of 128-bit characters with ASCI text. UTF-8 continues
> to be the way to do this, since it just a compression scheme that
> does not really depend on the fact that Unicode is currently
> limited to 32 bits. It could just as easily be extended to work
> with much larger character sets.
This is not even close to true. UTF-8 is very much dependent on the
32-bit architecture of Unicode, and in fact is constrained to 31-bit
code points. A quick check of the "10xxxxxx 10xxxxxx..." chart in RFC
2279, or in the Unicode Standard or ISO/IEC 10646, will confirm that.
And the word "currently," as used to refer to either the 21-bit or the
32-bit limit of Unicode/10646, is being used way too cavalierly.
Unicode is not going to be expanded beyond U+10FFFD, and nobody can
think of a non-whimsical reason why it should be.
-Doug Ewell
Fullerton, California