[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Combining characters (was: Re: [idn] hostname historyhell)
-----BEGIN PGP SIGNED MESSAGE-----
David Hopwood wrote:
> Soobok Lee wrote:
> > Now that <I><dot-above> is downcased to <i> as an exceptional case,
> > Then, we have an interesting question:
> > which direction should we lowercase <I><dot-above><acute> into ?
>
> To <i acute>. That is, the equivalence class is:
>
> <I><dot-above><acute> U+0049 U+0307 U+0301
> <I dot-above><acute> U+0130 U+0301
> <I><acute> U+0049 U+0301
> <I acute> U+00CD
> <i><acute> U+0069 U+0301
> <i acute> U+00ED
> <dotless i><acute> U+0131 U+0301
> <fullwidth I><acute> U+FF29 U+0301
> <fullwidth I><dot-above><acute> U+FF29 U+0307 U+0301
> <fullwidth i><acute> U+FF49 U+0301
>
> and if NFKC is used, also: [snip]
>
> <i acute> U+00ED is the normalised representative for all of these.
>
> <i><dot-above><acute> is in a different equivalence class (AFAIK, no
> language uses it, so this doesn't matter).
My mistake; it is used in Lithuanian. The Lithuanian usage would argue
for <i><dot-above><acute> being in the same equivalence class (since its
Lithuanian uppercase form is <I acute>). So, another solution that
should be considered is to use NFC o fold as in the current version of
stringprep, but map out U+0307 whenever it is attached to a character
based on 'i' or 'I'. That wouldn't cause any problems for Turkish or
Azeri. I'll list all the options in another post.
- --
David Hopwood <david.hopwood@zetnet.co.uk>
Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv
iQEVAwUBPAKL8jkCAxeYt5gVAQEggQgAjTxDBazTBPYuOW9VDRgfnCHqZr1QgJjB
d6znw/dDdrWmVJYkYlxUi31H8xRA++hs77vP1QazGCnOCyq3QP5EvF6X/gV2i42r
ccn6Ktpa+KiF8wUwXB3CLpnd+ZIraSXBFerhJgqQiEtyawJVaa8yIFcCf94lYV8k
YCCHO6xKJum5b1VUxpRbMQisd5/mTkuY5OV8ODkaBsh2e+ShvIH9l89eIsERItyP
GahwdOF4JArFSPUU2AeOVM2jvk2Fwr6htq3wWLtqi6aSjT1bCK/M6KsYW+hI/6qC
Km8xG3zk3NIHfl2gGvnUU4+GW+XDO/CynSaOBnv654bfVWAvG0PORQ==
=SHir
-----END PGP SIGNATURE-----