[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Re: Common-case optimization of nameprep
-----BEGIN PGP SIGNED MESSAGE-----
I wrote:
> Let getCanonicalClassX(ch) be the Canonical Combining Class of ch, except that
> it returns a negative value if ch \in X. The set X is defined as the union of:
>
> - characters that are mapped to something else (or mapped out) by the nameprep
> mapping step,
[ - characters for which the NFKC_QuickCheck flag in
DerivedNormalizationProperties-3.1.0.txt is either NFKC_NO or NFKC_MAYBE.]
> - characters that are prohibited in the output of nameprep (i.e. category D).
To answer my question about what X \ (MN union D) is:
Let M be the set of characters that are mapped to something else or mapped out.
Then we have the following:
X = M union NFKC_NO union NFKC_MAYBE union D.
MN = (M union NFKC_NO) \ D.
Therefore X \ (MN union D) = X \ (M union NFKC_NO union D)
= NFKC_MAYBE \ (M union NFKC_NO union D)
= NFKC_MAYBE.
NFKC_MAYBE is:
- a subset of the Combining Diacritical Marks block (0300..0345).
- some Arabic combining marks (0653..0655).
- vowel signs and length marks for Devanagari, Bengali, Oriya, Tamil,
Telugu, Kannada, Malayam, Sinhala, and Myanmar.
- the combining Katakana-Hiragana voiced and semi-voiced sound marks.
- a subset of the Hangul Jamo medial vowels (1161..1175).
- a subset of the Hangul Jamo final consonants (11A8..11C2).
These are the characters that can occur in a nameprepped name, but will
inhibit the optimization if nameprep is applied again. All of them are
combining characters of one type or another, which makes sense because
it is complicated to check that there is no corresponding precomposed
form.
Should the common case optimization be described in the next version of
the nameprep draft?
- --
David Hopwood <david.hopwood@zetnet.co.uk>
Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv
iQEVAwUBO2D/QTkCAxeYt5gVAQFnuwgAtzYRQqnOqZZ5SPiuhONCn3RQsv2K7H2G
YoOYvd4CsVrO8ggtpSbCCfd7o1gtCjVoJnpteTG0OTASSFQFM/dTZMYJlN9CJvmo
U20GoIWXpK6PomzB3CKzrn83O0f/HRh7tNou6Ks/80V8UmDHM08GcRtGoWqenyeY
iuOny+3GqSlFJtJ3QVR/IzFMCd2l35MIibyMZcbY32bOI2+OIgg4WLN2tR1jwbW5
g/tz1Rf+/SH+b5p/jNk5syoCM6b54e52g4Ha/stHb92BsAFz+mAtdk1vYNlfSbJc
SSw1+v4o1uoBQv5Ajqz948ThB7nQFx6bdYkNLVgXlSpFk2wJ0w4msQ==
=HrQd
-----END PGP SIGNATURE-----