[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Combining characters (was: Re: [idn] hostname historyhell)



-----BEGIN PGP SIGNED MESSAGE-----

David Hopwood wrote:
> Soobok Lee wrote:
> > Now that <I><dot-above> is downcased to <i> as an exceptional case,
> > Then, we have an interesting question:
> > which direction should we  lowercase   <I><dot-above><acute>   into ?
> 
> To <i acute>. That is, the equivalence class is:
> 
>   <I><dot-above><acute>                         U+0049 U+0307 U+0301
>   <I dot-above><acute>                          U+0130 U+0301
>   <I><acute>                                    U+0049 U+0301
>   <I acute>                                     U+00CD
>   <i><acute>                                    U+0069 U+0301
>   <i acute>                                     U+00ED
>   <dotless i><acute>                            U+0131 U+0301
>   <fullwidth I><acute>                          U+FF29 U+0301
>   <fullwidth I><dot-above><acute>               U+FF29 U+0307 U+0301
>   <fullwidth i><acute>                          U+FF49 U+0301
> 
> and if NFKC is used, also: [snip]
> 
> <i acute> U+00ED is the normalised representative for all of these.
> 
> <i><dot-above><acute> is in a different equivalence class (AFAIK, no
> language uses it, so this doesn't matter).

My mistake; it is used in Lithuanian. The Lithuanian usage would argue
for <i><dot-above><acute> being in the same equivalence class (since its
Lithuanian uppercase form is <I acute>). So, another solution that
should be considered is to use NFC o fold as in the current version of
stringprep, but map out U+0307 whenever it is attached to a character
based on 'i' or 'I'. That wouldn't cause any problems for Turkish or
Azeri. I'll list all the options in another post.

- -- 
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBPAKL8jkCAxeYt5gVAQEggQgAjTxDBazTBPYuOW9VDRgfnCHqZr1QgJjB
d6znw/dDdrWmVJYkYlxUi31H8xRA++hs77vP1QazGCnOCyq3QP5EvF6X/gV2i42r
ccn6Ktpa+KiF8wUwXB3CLpnd+ZIraSXBFerhJgqQiEtyawJVaa8yIFcCf94lYV8k
YCCHO6xKJum5b1VUxpRbMQisd5/mTkuY5OV8ODkaBsh2e+ShvIH9l89eIsERItyP
GahwdOF4JArFSPUU2AeOVM2jvk2Fwr6htq3wWLtqi6aSjT1bCK/M6KsYW+hI/6qC
Km8xG3zk3NIHfl2gGvnUU4+GW+XDO/CynSaOBnv654bfVWAvG0PORQ==
=SHir
-----END PGP SIGNATURE-----