[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] summary of reordering discussion
Thanks for the comments David. I will include your modifications in
my report.
Just one clarification:
> > The algorithm would probably be frozen for all time.
>
> I agree that these are not problems if the reordering is actually
> and *definitely* frozen for all time, but Soobok Lee was arguing that
> it could be updated for new scripts.
Soobok, in an effort to please both the people who complained that there
was no way to make updates AND the people who thought updates were
way too difficult, basically said that it could be done either way.
My analysis made the assumption that the "freeze it" group would prevail
(for the reasons I gave). There is no problem in that case.
Best regards,
Bruce
----- Original Message -----
From: "David Hopwood" <david.hopwood@zetnet.co.uk>
To: "Bruce Thomson" <bthomson@fm-net.ne.jp>; <idn@ops.ietf.org>
Sent: Tuesday, November 13, 2001 8:49 AM
Subject: Re: [idn] summary of reordering discussion
> -----BEGIN PGP SIGNED MESSAGE-----
>
> Bruce Thomson wrote (in reordering.txt):
> > 3) Reduced encoding lengths will make it less likely that an
> > encoded name will be too long.
> >
> > True, but perhaps not of practical significance.
> >
> > The maximum length of a Hangeul domain name is limited by the
> > restriction that a DNS label name be 63 characters or less.
> > Therefore, without re-ordering and assuming a label of the
> > form:
> >
> > www.bq--<ACE-encoded Hangeul>.com.
> >
> > we can calculate the maximum number of Hangeul that can be
> > used in a domain name as:
> >
> > maxHangeul = (maxLabel - prefLength - suffixLength) / encodeRatio
> > = (63 - 8 - 5) / 3.04
> > = 16
>
> The 63-octet limit applies to individual labels (the 255-octet limit
> on a full name is not usually the limiting factor in practice).
> Therefore the calculation should be:
>
> maxHangeul = (maxLabel - acePrefixLength) / encodeRatio
> = (63 - 4) / 3.04
> =~ 19 (rounded down)
>
> > - Disadvantages
> >
> > 1) Might cause problems if new code points were added later.
> >
> > Not a problem.
> >
> > The algorithm only re-orders existing code points, and will not cause
> > conflicts with new definitions.
> >
> > 2) Would not be able to be improved if new scripts were added or
> > sub-optimalities discovered.
> >
> > Not a problem.
> >
> > The algorithm would probably be frozen for all time.
>
> I agree that these are not problems if the reordering is actually
> and *definitely* frozen for all time, but Soobok Lee was arguing that
> it could be updated for new scripts.
>
> > 5) Too much complexity for not enough benefits.
> >
> > Controversial.
> >
> > We have to decide if the benefits are worth the effort.
>
> This section is supposed to be listing the disadvantages. The benefits
> have already been listed. There's a lot more to be said about complexity
> than just a single line without any supporting argument:
>
>
> 5) Reordering increases complexity and reduces efficiency.
>
> True (what is debatable is the degree of complexity and inefficiency).
>
> The draft includes 4096 mappings for Han, 1024 for Hangeul, and
> 528 for Group 2 and 3 scripts.
>
> The mappings for Han and Hangeul are sparse: since it
> is undesirable to use tables that cover the whole of the Han and
> Hangeul syllables blocks, implementing them will require something
> like a binary search or hashtable. (The reference code in the draft
> uses linear search, but that has unacceptable performance for
> production code.)
>
> A more complex specification introduces more opportunities for
> implementation errors.
>
> The reordering draft argues that nameprep requires similarly
> complex code and tables; but see point 6) below.
>
> Add to the end of point 6):
>
> Also, note that the nameprep mappings are from less-commonly to
> more-commonly used characters (often, the string will already be
> normalised). Reordering maps from every character in the affected
> scripts. Therefore, errors in reordering that affect a small number
> of characters are likely to be more significant than similar errors
> in nameprep.
>
>
> Incidentally, I intend to propose a simplification of nameprep that
> uses NFC instead of NFKC, and reduces the number of foldings (i.e.
> step 1 of stringprep) to about 800. The remaining foldings have
> regularities that allow them to be implemented with less than 1 Kbyte
> of tables (compared to ~ 10 Kbytes for Han & Hangeul reordering).
> This does not count tables needed for NFC normalisation, but those
> may already be provided by libraries or the operating system.
>
> This makes a significant difference for memory-constrained devices
> that need to interpret HTML, because they can assume that the HTML is
> already NFC-normalised, and can be designed so that this also applies
> to user input (e.g. a user typing in a URL) [*]. With slight
> modifications to stringprep, it is possible to avoid performing
> the final NFC normalisation step when the input is known to already
> be NFC-normalised. The result is that these devices would not need to
> implement NFC at all.
>
>
> [*] This assumes that they don't support decomposed characters for
> scripts that have some precomposed characters, but that's not an
> unreasonable assumption for constrained devices.
>
> - --
> David Hopwood <david.hopwood@zetnet.co.uk>
>
> Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
> RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
> Nothing in this message is intended to be legally binding. If I revoke a
> public key but refuse to specify why, it is because the private key has been
> seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: 2.6.3i
> Charset: noconv
>
> iQEVAwUBO++78zkCAxeYt5gVAQEbZwf/WBmPGo+6TQQ4WZ76bkCYGYSQtjAbbEuk
> D3LrFALySW6VqpgdZY+pL8i1meMGAeYBLLbUJP/cydzF68faQQTzjIe/MKnZ7LyF
> bV9JkKfdGnRBZXnKaCzc5AjZv79OC1wnoi/0K0MAbEOpNOuQO8YqV8+063WkjDP2
> SiGp/HJDA8Yzo6o8Xze2Zg0Q6Qv0tnbmfRPpnVOMXp/bio8RM948vegjKJ0k+rkX
> mptZrPK/UWKROlr49Xdhqj4WetL9RnTnXA6uC/lScDPPjAAzMcG4rcxkSn3ChmaM
> 1D/0pB9IY9+OiaMrd+eAYS+v649Lr8EepdEI+61OhODhtlyMWsXV2w==
> =ufos
> -----END PGP SIGNATURE-----
>
>