[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] ACE37,AMCW and LDUDE



----- Original Message -----
From: "Edmon" <edmon@neteka.com>
To: "Keith Moore" <moore@cs.utk.edu>; "Soobok Lee" <lsb@postel.co.kr>


> > I would *really* like to avoid complex reordering of codepoints, because
> > it would require fairly large tables (that would have to be implemented
> > on some machines of modest capability) and would be error-prone.
> >
> I agree.  The code block shifting mechanism in ACE37 however is very
simple
> and yeilds great benefit for CJK ideographs.
>

Code block shiffting  (  0x4E00:8FFFF ----> 0x0000: 0x52FF)
does reduce XOR diff value  for small subset of the code points
from   first 0x4E00 ~ 0x5E00 block.

For example, 0x4e00 and 0x5000 (xor diff is 0x1e00)
is mapped to 0x0000, 0x0200 (xir diff value 0x200 ).


But, For 0x4e00 and 0x6e00, xor diff values are  0x2000  for either
cases !


> > I could see having a very simple reordering of codepoints in order
> > to increase the efficiency of DUDE for ideographic languages.  But my
> > (admittedly biased) sense is that we're fast approaching the point
> > of diminishing returns.  Just how important is it to provide for names
> > that are one or two ideographs longer?
>
> It is more than one or two... in the worst case scenario for CJK, ACE37
out
> performs DUDE by 5 characters!

For 99% CJK domains, LDUDE allows 30~58% longer CJK domains than DUDE .

In other words, LDUDE compress 30~50% better than DUDE for average length
domains. This is more important point.
I will report when I complete testing with ACE37 + REORDERING later.


> And there is an undeniable demand from the CJK community to up the 14
> character limit to at least, I guess close to the original 23 character
> limit.
>
> >
> > Keith
>
> Edmon
>
>
>