[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] reordering strawpoll




----- Original Message ----- 
From: "David Hopwood" <david.hopwood@zetnet.co.uk>
To: "Soobok Lee" <lsb@postel.co.kr>; <idn@ops.ietf.org>
Sent: Tuesday, November 13, 2001 5:36 PM
Subject: Re: [idn] reordering strawpoll


> -----BEGIN PGP SIGNED MESSAGE-----
> 
> Soobok Lee wrote:
> > From: <DougEwell2@cs.com>
> > > In a message dated 2001-11-12 14:41:29 Pacific Standard Time,
> > > lsb@postel.co.kr writes:
> > >
> > > >  If you encode each Hangul syllabic in 3 jamos in utf8,
> > > >  it need 3 octets * 3 = 9 octets, while 3 basic latin letter need 3 octets
> > > >  in utf8. 3 times more space!  if there were any real "compaction" on
> > > >  hangul syllable code points, that may be just the bare minimum.
> > >
> > > But one paragraph earlier, Soobok stated that each hangul character is
> > > roughly equivalent to (i.e. carries roughly as much information as) 2.2 to
> > > 2.7 Latin letters.  So the 9 octets of UTF-8 actually encode the equivalent
> > > of 6.6 to 8.1 Latin letters, which means Hangul encoding is 10% to 27% less
> > > efficient than Latin encoding.  Representing it as two-thirds (67%) less
> > > efficient is obviously misleading.  Such claims only detract attention away
> > > from any merit the reordering plan may have.
> > 
> > my analogy cited above was for *UTF8*.
> 
> However, your argument that it is important to reduce the *average* length
> of encoded names, certainly doesn't apply to UTF-8 (even if it's accepted
> that it applies to ACE, which I don't accept).

Yes,
That argument is just about  justifying  adding hangul syllable code block
in addition to hangul jamo (alphabet) block :  9 octets -> 3 octets "compaction".


> 
> Users will never see (much less type in) UTF-8 octet string encodings
> except in obscure debugging situations.

But, without hangul syllable block, users will suffer from   
 3 times more resource consumption for a unicode hangul syllable. 
6 hangul syllables ( 6 * 3 * 3 = 54 octets ) are allowed  within utf8 63 octets limit !!!!

That's why  hangul jamo block is not enough to encode hangul efficiently.

Soobok Lee


> The 63-octet limitation on label
> length when using UTF-8 appears to be sufficient, AFAICS (even for Indic
> scripts, which are less efficient than Hangul in UTF-8).
> 
> - -- 
> David Hopwood <david.hopwood@zetnet.co.uk>
> 
> Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
> RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
> Nothing in this message is intended to be legally binding. If I revoke a
> public key but refuse to specify why, it is because the private key has been
> seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: 2.6.3i
> Charset: noconv
> 
> iQEVAwUBO/DbMjkCAxeYt5gVAQGXbwgAouMXtfu/AZi+OBm0R2CwjHc+2UFPMYk+
> qK1GktXy1WDSLlx+EV3brdlHxaQsE51ryfd2eBoHjNpdXujkG44JFgNcqI4UgV6r
> fHIfM6zYJqNOZaQlq2o7HmOxr32WKjgtIRwds7src9rXZ6pZGHsx3V1dIyZvg69X
> U2lzG7Jh7mEuDjQakrybwi+43ZN/Fb3J7xd7AGT/knPPGh1xZN0mZYDayTbUjMD4
> mIiKYri6DVcNL/mlgx3mIuaCVXPiluxCSZqT8jSCSkovWsmCpzT3e0F2/B6YwxHu
> fTilBfNOYjS7njLg+5uhllKxMk8qBlivnlYdPG34AFv26hoRn9yCaw==
> =nQjs
> -----END PGP SIGNATURE-----
>