[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] call for comments for REORDERING



Soobok Lee <lsb@postel.co.kr> wrote:

> Even For GROUPS of those rare cases, we get Always SHORTER labels than
> usual.

As has already been pointed out, this is impossible.  Here is one class
of labels that are made longer by reordering:  Labels that use lots of
"uncommon" code points from the bottom 2K (or whatever) of the block.
Without reordering, these code points are already close together, but
with reordering they will be scattered all over the block, resulting in
a longer ACE.  Of course such labels are supposed to be very rare.

All compression schemes must make some inputs shorter and some longer.
The goal is to makes the more common ones a lot shorter, and the very
rare ones only slightly longer.

> That ACE encoding overhead is so big that it cannot be compensated by
> the superiority in the information capacity of a han letter compared
> to a latin letter.

That's not obvious.  Here are the AMC-ACE-Z encodings of the same
sentence in English and Chinese (from the Examples section of the
AMC-ACE-Z spec):

              English: WhyCantTheyJustSpeakChinese
 Chinese (simplified): ihqwcrb4cv8a8dqg056pqjye
Chinese (traditional): ihqwctvzc91f659drss3x8bo0yb

Of course this is only one data point, but my hunch is that it's not
atypical.

Paul Hoffman / IMC <phoffman@imc.org> wrote:

> The reordering proposal and its proponent are based on the faulty
> premise that there will be a need for very long domain names for the
> scripts that have been reordered.

Actually, he has said many times that his goal is not only to allow long
domain names, but also to make average domain names easier for humans to
cope with when they are forced to see the ACE.  (And others have argued
that neither goal is important enough to justify the complexity.  I'd
like to know if the CJK NICs have opinions on the matter.)

AMC