[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] call for comments for REORDERING
----- Original Message -----
From: "David Hopwood" <david.hopwood@zetnet.co.uk>
To: "Soobok Lee" <lsb@postel.co.kr>; <idn@ops.ietf.org>
Sent: Saturday, October 20, 2001 10:41 AM
Subject: Re: [idn] call for comments for REORDERING
> -----BEGIN PGP SIGNED MESSAGE-----
>
> Soobok Lee wrote:
> > From: "Martin Duerst" <duerst@w3.org>
> > > >
> > > >1) saturations in TLD namespaces would require longer names for which
> > > > REORDERING is designed to give greater benefits/compression ratio.
> > >
> > > No. What James referred to is that saturation tends to fill up the
> > > short name slots, and thus flatten the probability distribution.
> > > I.e. if somebody doesn't get the name they wanted, the chance is
> > > that they go for something like xq.com, because it's easy to
> > > remember because it's short. Neither x nor q are very frequent
> > > letters.
> >
> > Han/hangeul characters carries meanings while latin alphabets
> > denote phonemes.
>
> ?? Unless I'm very confused about Hangul, it is at least as much
> phonetically-based as Latin. Hangul Jamo are letters of an alphabet,
> which happen to be arranged in square cells corresponding to syllables,
> instead of linearly.
You are only partly correct in that Hangul is phonetic.
If you ever read a hangul-to-hangul dictionary, you can find easily that
over 70~80% of modern hangul vocabularies came from 1:1 mappings of
Chinese words like most english & french words came from latin ones.
Therefore, one hangul character carries similar amount of information
with its chinese character counterparts.
hangul/han both carries as much information as about 2 latins characters.
>
> Moreover, each Hangul syllable (encoded as a single character when
> NFC/NFKC-normalised), normally represents 3 Jamo. That should be taken
> into account when assessing whether Hangul is encoded compactly enough.
>
> > Therefore your analogy between latin and han domains
> > may be false. Chinese people would rather choose to register
> > digit-added variants of already taken desired domains in saturated
> > ML.com, instead of choosing non-sense irrelevant rare han characters.
> >
> > Later time, I will provide some proofs that SC and TC only have
> > small partial set of frequent characters.
>
> That's not in dispute. The argument is about whether the complexity of
> reordering is worth the additional compression. IMHO it isn't -
> AMC-Z (or UTF-8) encodings are sufficiently compact that the 63-octet
> and 255-octet limits are not a serious problem for any language or script,
> and the savings for average names are marginal.
>
> - --
> David Hopwood <david.hopwood@zetnet.co.uk>
>
> Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
> RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
> Nothing in this message is intended to be legally binding. If I revoke a
> public key but refuse to specify why, it is because the private key has been
> seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: 2.6.3i
> Charset: noconv
>
> iQEVAwUBO9DO2zkCAxeYt5gVAQF/7AgAzp3KB/kPA2XAxb43hCSbrLBOxavd4WSq
> DYfvw2UuwloLkEZB+tkkoOPucW/ElLmaYjuYMKt6nea2LZthLpTWDc8a8ENXqM34
> Z+aP8nqN9XzeMTPisebpCcTE7PZYWdi87a0grmL0KFBzYG0PsxAB905Yvf12oU4U
> u3da6Ku37YJeYK0jNi4/qhoAUZ8gyz+gW4MWWxCmuAIrvmIkaf/d4lX4Tu+75mg2
> VcS3ezCGbOt3Wf0GIfUl869BBRbPB7bScBX0EjP/C+sQpCVR6gVs6SKDS9zY/W6k
> XImrf7IuLg57za70dy5YiCgNBYOvlNa4Xgi3d+DFoW7jntmj4MEUYw==
> =4Lmr
> -----END PGP SIGNATURE-----
>
>