[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] reordering strawpoll



-----BEGIN PGP SIGNED MESSAGE-----

Soobok Lee wrote:
> From: <DougEwell2@cs.com>
> > In a message dated 2001-11-12 14:41:29 Pacific Standard Time,
> > lsb@postel.co.kr writes:
> >
> > >  If you encode each Hangul syllabic in 3 jamos in utf8,
> > >  it need 3 octets * 3 = 9 octets, while 3 basic latin letter need 3 octets
> > >  in utf8. 3 times more space!  if there were any real "compaction" on
> > >  hangul syllable code points, that may be just the bare minimum.
> >
> > But one paragraph earlier, Soobok stated that each hangul character is
> > roughly equivalent to (i.e. carries roughly as much information as) 2.2 to
> > 2.7 Latin letters.  So the 9 octets of UTF-8 actually encode the equivalent
> > of 6.6 to 8.1 Latin letters, which means Hangul encoding is 10% to 27% less
> > efficient than Latin encoding.  Representing it as two-thirds (67%) less
> > efficient is obviously misleading.  Such claims only detract attention away
> > from any merit the reordering plan may have.
> 
> my analogy cited above was for *UTF8*.

However, your argument that it is important to reduce the *average* length
of encoded names, certainly doesn't apply to UTF-8 (even if it's accepted
that it applies to ACE, which I don't accept).

Users will never see (much less type in) UTF-8 octet string encodings
except in obscure debugging situations. The 63-octet limitation on label
length when using UTF-8 appears to be sufficient, AFAICS (even for Indic
scripts, which are less efficient than Hangul in UTF-8).

- -- 
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBO/DbMjkCAxeYt5gVAQGXbwgAouMXtfu/AZi+OBm0R2CwjHc+2UFPMYk+
qK1GktXy1WDSLlx+EV3brdlHxaQsE51ryfd2eBoHjNpdXujkG44JFgNcqI4UgV6r
fHIfM6zYJqNOZaQlq2o7HmOxr32WKjgtIRwds7src9rXZ6pZGHsx3V1dIyZvg69X
U2lzG7Jh7mEuDjQakrybwi+43ZN/Fb3J7xd7AGT/knPPGh1xZN0mZYDayTbUjMD4
mIiKYri6DVcNL/mlgx3mIuaCVXPiluxCSZqT8jSCSkovWsmCpzT3e0F2/B6YwxHu
fTilBfNOYjS7njLg+5uhllKxMk8qBlivnlYdPG34AFv26hoRn9yCaw==
=nQjs
-----END PGP SIGNATURE-----