[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Debunking the UTF-8 myth





Please forgive a minor distraction, but it occurs to me that the ACE vs. 
UTF-8 debate has UTF-8 positioned inaccurately..

ACE is purported to be ugly because it translates clean data into an ASCII 
form.

By contrast, UTF-8 is purported to be better because it is 8-bit clean.

Indeed, it IS 8-bit clean, but the source UTF table needs more than 8 bits.

In other words, UTF-8 plays havoc with encoding rules, in order to work 
with an 8-bit model.  As such, some characters take 8-bits, but some take 
24, no?

In other words, UTF-8 also requires a special encoding pass, albeit one 
that maps the data to 8-bit chunks rather than ASCII chunks.  But it 
appears to be no more pure or direct than ASCII.

At the same time, UTF-8 requires more of the infrastructure to be upgraded 
before there is much utility for iDN.  ACE requires less.

Please do provide a counter-explanation, folks.  I'll be honestly delighted 
to be wrong about this assessment, as long as the 'wrong' is in technical 
terms from a system-wide perspective.

d/

----------
Dave Crocker  <mailto:dcrocker@brandenburg.com>
Brandenburg InternetWorking  <http://www.brandenburg.com>
tel +1.408.246.8253;  fax +1.408.273.6464