[idn] turkish i

Hi, All

While i study case preservation issues, i found the following sections well known

but not discussed thoroughlty recently AFAIK.

http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-06.txt

0130; 0069; Case map
0131; 0069; Case map

dot-less i (0131) and dot-above I (0130): both are mapped to small i (0069)

by language-independent casefolding.

just at the cost of shrinked turkish/azerbaijani namespaces. lost dot-less i.

0130 and 0131 can be regarded as the same from the turkish viewpoint ?

Latin-script-using people accept small i === dotless i ?

Turkish people accept small i === dotless i , too ?

1:n mappings:

I -> i ( latin)

I -> dot-less i ( turkish )

n:1 mappings:

Dot-above I -> i ( turkish )

I -> i ( latin )

Here, current case mapping loses dot-less i : I -> dot-less i -> i .

similar impasse due to cross-language conflicts in 1:n and n:1 mappings is also found in TC/SC JC/KC equivalence. TC/SC equivalence can be be done likewise in language-independant way ?

http://www.unicode.org/charts/PDF/U0100.pdf

0130 Ä° LATIN CAPITAL LETTER I WITH DOT ABOVE
= LATIN CAPITAL LETTER I DOT
â€¢ Turkish, Azerbaijani
â€¢ lowercase is 0069 i
â†’0049 I latin capital letter i

â‰¡0049 I 0307

0131 Ä± LATIN SMALL LETTER DOTLESS I
â€¢ Turkish, Azerbaijani
â€¢ uppercase is 0049 I
â†’0069 i latin small letter i

The following descibes another problem in the order of mappings and normalization in nameprep.

When doing RACE conversion on the next 4 code points sequences with mDNkit v2.0

0069 0307

bq--ap7wsby

0049 0307         IÌ‡
bq--ap7wsby

0130                 IÌ‡
i

0131

i

Even though 0049 0307 === 0130 (modulo NFC), two have different output labels .

That could have been avoided

if we had chosen CaseMap(NFKC(?)) instead of NFKC(CaseMap(?)).

CaseMap(NFKC(?)) != NFKC(CaseMap(?)) ???

Soobok Lee