[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] ACE37,AMCW and LDUDE



I compared ACE37 with my LDUDE (reordered DUDE).

LDUDE's reordering mapping tables  are tuned to produce
as many single-quintet-XOR- diff values as possible
based on character & adjacency statistics.

If we apply reordering to ACE37, we can improve ACE37.

However, if reordering applied to ACE37 become MOST successful,
ACE37 should support  single quintet XOR difference when encoding,
that ACE37 does not support now.

Reordering  already statistically  guarantees XOR diff values less than
0x1000 for Han, by mapping most frequent 4096 han letters into 4096
codes block starting from 0x5000. Most freq 4096 han letters
( in about 23000 han letters  )  have cumulative frequency sum
of 99.9% in typical chinese texts and business names.
(for top 256, 58% )

LDUDE compresses the next 25 letters of Japanese strings down to 47
chars of ACE labels, 10 chars shorter than that of AMCW and ACE37.


JP1) Japanese String 1: ( 25 letters )
      U+793E U+56E3 U+6CD5 U+4EBA U+65E5 U+672C U+30CD U+30C3 U+30C8
      U+30EF U+30FC U+30AF U+30A4 U+30F3 U+30D5 U+30A9 U+30E1 U+30FC
      U+30B7 U+30E7 U+30F3 U+30BB U+30F3 U+30BF U+30FC

      DUDE-02 : z3xQu97Pv4vGuuyRu5xRu6Jxz8BQMuHtDxDMxHuGzNwItPwMxAtE\
                wIwIwNwD  (60 chars)
      LDUDE   : xs8Nu2Cu4RvMGBysxGyCKtHtQCPFtAyPyKtPBGPyAyAyFyR
                ( 47 chars, 21.6% shorter)
      AMCW    : z3vQ28DDyxs5KB9fCjnvs6P6DI8R9N4RE9D7F4J8B9N5H8H9D5M9\
                D5R9N ( 57 chars )
      LAMCW   : xs2NwsQu4B3KNPvs6M4JD5E4KIFA5A7P5H4KMPA6A4A6F4K
                ( 47 chars )

      ACE37: i9urut6hm8jfaqv0m9dv1wewbx7wjyjwbynx6zsy8wtybygwky8y8ycy3
                (57 char)

     JP3) Japanese String 3: ( 17 letters )
      U+6771 U+4EAC U+90FD U+60C5 U+5831 U+30B5 U+30FC U+30D3 U+30B9
      U+7523 U+696D U+5065 U+5EB7 U+4FDD U+967A U+7D44 U+5408

      DUDE-02 : yztBu37P78xB9svIv29Ey22EwJuRyKwx3Kt6wQv3sI87CttyK734\
                H85vQu3wN (61 chars)
      LDUDE   : xttHxPvtFu9CDyssAyEyHyRys9PxQ4KHGEu4CuwJ
                ( 40 chars, 34.4% shorter)
      AMCW    : z3vQ28DDyxs5KB9fCjnvs6P6DI8R9N4RE9D7F4J8B9N5H8H9D5M9\
                D5R9N ( 57 chars )
      LAMCW   : xs2NwsQu4B3KNPvs6M4JD5E4KIFA5A7P5H4KMPA6A4A6F4K
                ( 47 chars )

      ACE37: drhaetvihk1o67ka44y9xfzahcqv2e6883micbaud7apuqac (48 char)


Soobok Lee, lsb@postel.co.kr


> A New Internet-Draft is available from the on-line Internet-Drafts
directories.
>
>
> Title : ACE Utilizing All 37 Alphanumeric Characters (ACE37)
> Author(s) : E. Chung, D. Leung
> Filename : draft-chung-idn-ace37-00.txt
> Pages : 17
> Date : 05-Jul-01
>
> ACE37 is a combination of DUDE-02, AMC-W/V and LACE.  ACE37 utilizes
> the simple one pass algorithm of DUDE, the character block
> considerations of AMC-W/V and the Base-32 compression of LACE.  It
> also fully utilizes entire LDH set currently allowed in the DNS (A-
> z, 0-9 and '-') within its character repertoire to optimize
> performance and compression.
>
> A URL for this Internet-Draft is:
> http://www.ietf.org/internet-drafts/draft-chung-idn-ace37-00.txt
>