[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] summary of reordering discussion
-----BEGIN PGP SIGNED MESSAGE-----
Bruce Thomson wrote (in reordering.txt):
> 3) Reduced encoding lengths will make it less likely that an
> encoded name will be too long.
>
> True, but perhaps not of practical significance.
>
> The maximum length of a Hangeul domain name is limited by the
> restriction that a DNS label name be 63 characters or less.
> Therefore, without re-ordering and assuming a label of the
> form:
>
> www.bq--<ACE-encoded Hangeul>.com.
>
> we can calculate the maximum number of Hangeul that can be
> used in a domain name as:
>
> maxHangeul = (maxLabel - prefLength - suffixLength) / encodeRatio
> = (63 - 8 - 5) / 3.04
> = 16
The 63-octet limit applies to individual labels (the 255-octet limit
on a full name is not usually the limiting factor in practice).
Therefore the calculation should be:
maxHangeul = (maxLabel - acePrefixLength) / encodeRatio
= (63 - 4) / 3.04
=~ 19 (rounded down)
> - Disadvantages
>
> 1) Might cause problems if new code points were added later.
>
> Not a problem.
>
> The algorithm only re-orders existing code points, and will not cause
> conflicts with new definitions.
>
> 2) Would not be able to be improved if new scripts were added or
> sub-optimalities discovered.
>
> Not a problem.
>
> The algorithm would probably be frozen for all time.
I agree that these are not problems if the reordering is actually
and *definitely* frozen for all time, but Soobok Lee was arguing that
it could be updated for new scripts.
> 5) Too much complexity for not enough benefits.
>
> Controversial.
>
> We have to decide if the benefits are worth the effort.
This section is supposed to be listing the disadvantages. The benefits
have already been listed. There's a lot more to be said about complexity
than just a single line without any supporting argument:
5) Reordering increases complexity and reduces efficiency.
True (what is debatable is the degree of complexity and inefficiency).
The draft includes 4096 mappings for Han, 1024 for Hangeul, and
528 for Group 2 and 3 scripts.
The mappings for Han and Hangeul are sparse: since it
is undesirable to use tables that cover the whole of the Han and
Hangeul syllables blocks, implementing them will require something
like a binary search or hashtable. (The reference code in the draft
uses linear search, but that has unacceptable performance for
production code.)
A more complex specification introduces more opportunities for
implementation errors.
The reordering draft argues that nameprep requires similarly
complex code and tables; but see point 6) below.
Add to the end of point 6):
Also, note that the nameprep mappings are from less-commonly to
more-commonly used characters (often, the string will already be
normalised). Reordering maps from every character in the affected
scripts. Therefore, errors in reordering that affect a small number
of characters are likely to be more significant than similar errors
in nameprep.
Incidentally, I intend to propose a simplification of nameprep that
uses NFC instead of NFKC, and reduces the number of foldings (i.e.
step 1 of stringprep) to about 800. The remaining foldings have
regularities that allow them to be implemented with less than 1 Kbyte
of tables (compared to ~ 10 Kbytes for Han & Hangeul reordering).
This does not count tables needed for NFC normalisation, but those
may already be provided by libraries or the operating system.
This makes a significant difference for memory-constrained devices
that need to interpret HTML, because they can assume that the HTML is
already NFC-normalised, and can be designed so that this also applies
to user input (e.g. a user typing in a URL) [*]. With slight
modifications to stringprep, it is possible to avoid performing
the final NFC normalisation step when the input is known to already
be NFC-normalised. The result is that these devices would not need to
implement NFC at all.
[*] This assumes that they don't support decomposed characters for
scripts that have some precomposed characters, but that's not an
unreasonable assumption for constrained devices.
- --
David Hopwood <david.hopwood@zetnet.co.uk>
Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv
iQEVAwUBO++78zkCAxeYt5gVAQEbZwf/WBmPGo+6TQQ4WZ76bkCYGYSQtjAbbEuk
D3LrFALySW6VqpgdZY+pL8i1meMGAeYBLLbUJP/cydzF68faQQTzjIe/MKnZ7LyF
bV9JkKfdGnRBZXnKaCzc5AjZv79OC1wnoi/0K0MAbEOpNOuQO8YqV8+063WkjDP2
SiGp/HJDA8Yzo6o8Xze2Zg0Q6Qv0tnbmfRPpnVOMXp/bio8RM948vegjKJ0k+rkX
mptZrPK/UWKROlr49Xdhqj4WetL9RnTnXA6uC/lScDPPjAAzMcG4rcxkSn3ChmaM
1D/0pB9IY9+OiaMrd+eAYS+v649Lr8EepdEI+61OhODhtlyMWsXV2w==
=ufos
-----END PGP SIGNATURE-----