[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Debunking the UTF-8 myth
Please forgive a minor distraction, but it occurs to me that the ACE vs.
UTF-8 debate has UTF-8 positioned inaccurately..
ACE is purported to be ugly because it translates clean data into an ASCII
form.
By contrast, UTF-8 is purported to be better because it is 8-bit clean.
Indeed, it IS 8-bit clean, but the source UTF table needs more than 8 bits.
In other words, UTF-8 plays havoc with encoding rules, in order to work
with an 8-bit model. As such, some characters take 8-bits, but some take
24, no?
In other words, UTF-8 also requires a special encoding pass, albeit one
that maps the data to 8-bit chunks rather than ASCII chunks. But it
appears to be no more pure or direct than ASCII.
At the same time, UTF-8 requires more of the infrastructure to be upgraded
before there is much utility for iDN. ACE requires less.
Please do provide a counter-explanation, folks. I'll be honestly delighted
to be wrong about this assessment, as long as the 'wrong' is in technical
terms from a system-wide perspective.
d/
----------
Dave Crocker <mailto:dcrocker@brandenburg.com>
Brandenburg InternetWorking <http://www.brandenburg.com>
tel +1.408.246.8253; fax +1.408.273.6464