[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] turkish i




 
 
Hi, All
 
While i study case preservation issues, i found the following sections well known
but not discussed thoroughlty recently AFAIK.
 
 
 
 
0130; 0069; Case map
0131; 0069; Case map
 
dot-less i (0131) and  dot-above I (0130): both are mapped to  small i (0069)
by  language-independent casefolding.
just at the cost of shrinked turkish/azerbaijani namespaces. lost dot-less i.
0130 and 0131 can be regarded as the same  from the turkish viewpoint ?   
Latin-script-using people accept  small i  === dotless i ?
Turkish people accept  small i === dotless i , too ?
 
1:n mappings:
 
 I -> i             ( latin)
 I -> dot-less i    ( turkish )
 
n:1 mappings:
 
 Dot-above I ->  i  ( turkish )
 I  -> i            ( latin )
 
 
Here, current case mapping loses dot-less i  :  I -> dot-less i -> i .
 
 
similar impasse due to cross-language conflicts in 1:n and n:1 mappings is also found in TC/SC JC/KC equivalence. TC/SC equivalence can be be done likewise in language-independant way ?
 
 
 
0130 İ LATIN CAPITAL LETTER I WITH DOT ABOVE
= LATIN CAPITAL LETTER I DOT
• Turkish, Azerbaijani
• lowercase is 0069 i
→0049 I latin capital letter i
≡0049 I 0307
 
0131 ı LATIN SMALL LETTER DOTLESS I
• Turkish, Azerbaijani
• uppercase is 0049 I
→0069 i latin small letter i
 
 
 
The following descibes another problem in the order of mappings and normalization in nameprep.
 
When doing RACE conversion on the next 4 code points sequences with mDNkit v2.0
 
0069 0307         
bq--ap7wsby

 
0049 0307        
bq--ap7wsby

0130                
i

0131
i
 
 
Even though 0049 0307 === 0130 (modulo NFC),  two have different  output labels . 
That could have been avoided 
 if we had chosen  CaseMap(NFKC(?)) instead of NFKC(CaseMap(?)).
 
CaseMap(NFKC(?))  !=  NFKC(CaseMap(?))  ???
 
 
Soobok Lee