[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] reordering strawpoll




...
> The working group co-chairs would like to conduct a strawpoll to
> guage the consensus of the working group on reordering
> http://www.ietf.org/internet-drafts/draft-ietf-idn-lsb-ace-02.txt

From http://www.ietf.org/internet-drafts/draft-ietf-idn-lsb-ace-02.txt:

>    As such examples shows, most ACE algorithms are designed to favor 
>    latin and small script blocks over very large blocks like han and 
>    hangeul. For CJK people (in China,Hongkong,Macao,Japan,South/North
>    Korea and Taiwan), that disadvantage results in longer ACE labels
>    and less room for free-form long names. It is clear that there 
>    must be some improvements to ACEs to compensate this unfair 
>    disadvantage.

It's a bit strange that this comes from quarters where there is already quite
a lot of "compaction" in the representations of text.  A single Han ideograph
expresses "more" than a single letter in other scripts.  And a single Hangul
syllable character expresses from 2 to 6 letters in one single character.
Hangul is fundamentally is an alphabetic script, with 17 consonant letters
and 11 vowel letters, plus some variant (and historic) letters.

<ironic-mode>

I think it's an urgent matter to follow the example of Hangul for all alphabetic
scripts and encode tens of thousands of syllables in the Latin, Greek, etc.
scripts, so that we can compensate this unfair disadvantage that Latin, etc.
have compared to Hangul (and Han, though that is not alphabetic).

And then, of course, we have to have a reordering stage, where the
most common syllables are ordered so that also an ACE encoded
string gets as short as possible.

</ironic-mode>

Of course, which languages should be considered for selection for
encoding of Latin or Cyrillic syllables, and which languages's statistics
should be used for the reordering will be hotly debated!  ;-)

            On the ironic side
            /kent k