[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Normalisation and case folding (was: IDNA comment)
-----BEGIN PGP SIGNED MESSAGE-----
[Cross-posted from the IDN list; reply-to set to unicode@unicode.org.
Change it back for replies that relate specifically to IDN.]
Mark Davis wrote:
> >stringprep(NFC(x)) == stringprep(x) [does not always hold]
>
> This was brought up early in the Unicode 3.2 development. We have
> programmatically checked, and I with dot is the only case that causes
> a problem.
No it isn't:
NFD(toCasefold("\u1FB2"))
= NFD("\u1F70\u03B9") = "\u03B1\u0300\u03B9" (alpha-varia, iota)
NFD(toCasefold("\u1FB3\u0300"))
= NFD("\u03B1\u03B9\u0300") = "\u03B1\u03B9\u0300" (alpha, iota-varia)
(Which NF is used after toCasefold isn't important.)
What algorithm did you use to check this? In any case, I'll discuss case
folding in part 3 of my Unicode 3.2 comments, later today.
> I have no doubt that it will be resolved for U3.2, and even if StringPrep
> doesn't pick up U3.2, it could add a mapping to that one case.
How would you add a mapping from a sequence of two characters (e.g.
"I\u0307"), without changing the stringprep/nameprep algorithm, rather
than the tables?
(Actually there is an indirect way to do it: map out U+0307, and map all
composed characters with dot-above to the forms without dot-above. However,
that won't work for the standard case folding algorithm, because it would
break consistency with earlier versions, and it doesn't fix the problem
with Greek ypogegrammeni/prosgegrammeni, anyway.)
- --
David Hopwood <david.hopwood@zetnet.co.uk>
Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv
iQEVAwUBPGjXmzkCAxeYt5gVAQFj1wgAiC4pPi3sJqK6uhOdXrigiUi85QMds7+o
LRgvpGldJ1l+LmuTh7PHqlq/rW5A+mvq/Usm0Gj9rZK0ALyc1i6nKvCN1hUPTGEA
cnCGR24aCqXa1aBNVEDT2FfY4QlJqiRBNjPxncMm3Od6SA3EN0cI76jUTXgk3YxV
S/Ffd2eszm3jy4qeBIkgkXhul7mKxonwdzmGggGLxAj25RNbzzoBAiGmtH2NQn/C
IZExFrQjFXGNwLQ7wjhbRSs1nWwRYP0OcJIHEACSWcf/tYu+opLB6Dcq3ZXAk20y
P4a/c8KvryTd6ZF7d+8sV3x4yCzEh5PzPjgeYRqv0lbAMCsO3bTasw==
=pR5H
-----END PGP SIGNATURE-----