[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] call for comments for REORDERING




----- Original Message ----- 
From: "Martin Duerst" <duerst@w3.org>
To: <DougEwell2@cs.com>; <idn@ops.ietf.org>
Cc: <lsb@postel.co.kr>; <jseng@pobox.org.sg>
Sent: Friday, October 19, 2001 4:05 PM
Subject: Re: [idn] call for comments for REORDERING


> At 02:05 01/10/19 -0400, DougEwell2@cs.com wrote:
> >In a message dated 2001-10-18 21:33:55 Pacific Daylight Time,
> >lsb@postel.co.kr writes:
> >
> > >  1) saturations in TLD namespaces would require longer names for which
> > >      REORDERING is designed to give greater benefits/compression ratio.
> >
> >Is it not the case that logographic/ideographic writing systems such as Han
> >and the syllable-oriented Unicode encoding of Hangul, with their large
> >numbers of characters, convey more information per character than alphabetic
> >scripts?
> 
> Very much so, of course.

han/hangeul have huge lists of characters (20992/11172). 
To encode them, we need more digits and bits than latin letters. 
That ACE encoding overhead is so big that it cannot be compensated by the superiority in the information capacity of a han letter compared to a latin letter. The sizes of Latin alphabets or variants do not exceed 30.

In UCS-2, 2*N octets are need for latin/han lables of length N.
But ACE/UTF8 which favor latin script, require roughly 3.0*N octets for long han labels. REORDERING reduce those requirement into 2.2*N, close to that of UCS-2.

Soobok Lee