[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] UTF-8 / RACE

To: "Sherin Alsoudani" <sherinalsoudani@hotmail.com>,<amc@cs.berkeley.edu>, <idn@ops.ietf.org>,"Adonis El Fakih" <adonis@ayna.com>
Subject: Re: [idn] UTF-8 / RACE
From: "James Seng/Personal" <James@Seng.cc>
Date: Mon, 28 May 2001 15:32:08 +0800
Delivery-date: Mon, 28 May 2001 00:33:14 -0700
Envelope-to: idn-data@psg.com

> Using utf-8 from the system point of view is the same as using cp1256.
> They are both 8bit encoding schemes, and most applications people use
> to publish on the net, use unicode in the backend.

Are we talking about 8-bit vs 7-bit now? If we are, can I raise my hand
to say 8-bit is not sufficient for I18N?

CP1256 and UTF-8 are both 8-bit encodings no doubt. I can also claim
BIG5 and UTF-8 are both 8-bit. Thus, system which works for BIG5
*should* work for UTF-8 (or easily converted). Unfortunately, that is
not true.

While a 8-bit application is already 8-bit clean thus making it easiler,
this does not means the other dependent infrastructure which the
applications relies upon is able to deal with 8-bit characters. Sure, we
can upgrade the infrastructure and going through the pain. The question
now here is that can the WG decide we want to do this...and I am hearing
mixed messages from the members.

> So I am assuming this is what sherine is refering to.  It is easier to
> convert applications using 8bit encoding to use utf8 than doing RACE.
> While RACE will be good for 7bit encoding applications.
>
> I have succeffuly converted some apps we use localy to work with utf8
> and we had to account for the double bytes compared to one when
working
> with cp1256 or iso88596, and it was not a lot of work. I actually did
> this about two years ago.

Actually comparison of UTF-8 is much more complex than handling
double-byte compare. This is why there are Unicode Normalization and its
various forms. Especially for Arabic where there are different
presentation forms beyond the U+0600 to U+06FF which you can only
compare it by normalization. Bit-wise comparison wont work in these
cases.

And if you like double-byte comparison, then try some of the Chinese
double-byte encodings especially Industry-Standard BIG5. You could weep
as you code.

-James Seng

Prev by Date: Re: [idn] UTF-8 / RACE
Next by Date: Re: [idn] Let's go forward with IDNA and UTF-8
Prev by thread: Re: [idn] UTF-8 / RACE
Next by thread: Re: [idn] UTF-8 / RACE
Index(es):
- Date
- Thread