Hi, All
While i study case preservation issues, i
found the following sections well known
but not discussed thoroughlty recently
AFAIK.
0131; 0069; Case map dot-less i (0131) and dot-above I
(0130): both are mapped to small i (0069)
by language-independent
casefolding.
just at the cost of
shrinked turkish/azerbaijani
namespaces. lost dot-less i.
0130 and 0131 can be regarded as the
same from the turkish viewpoint ?
Latin-script-using people accept small
i === dotless i ?
Turkish people accept small i ===
dotless i , too ?
1:n mappings:
I ->
i (
latin)
I -> dot-less
i ( turkish )
n:1 mappings:
Dot-above I ->
i ( turkish )
I ->
i ( latin
)
Here, current case
mapping loses dot-less i : I -> dot-less i -> i
.
similar impasse due to cross-language
conflicts in 1:n and n:1 mappings is also found in TC/SC JC/KC equivalence.
TC/SC equivalence can be be done likewise in language-independant way
?
0130 İ LATIN CAPITAL
LETTER I WITH DOT ABOVE
= LATIN CAPITAL LETTER I DOT • Turkish, Azerbaijani • lowercase is 0069 i →0049 I latin capital letter i ≡0049 I 0307
0131 ı LATIN SMALL LETTER DOTLESS I
• Turkish, Azerbaijani • uppercase is 0049 I →0069 i latin small letter i The following descibes another problem in the order of mappings and
normalization in nameprep.
When doing RACE conversion on the next 4 code points sequences with mDNkit
v2.0
0069
0307
bq--ap7wsby
0049
0307 İ
bq--ap7wsby 0130 İ i 0131 i Even though 0049 0307 === 0130 (modulo
NFC), two have different output labels .
That could have been
avoided
if we had
chosen CaseMap(NFKC(?)) instead of
NFKC(CaseMap(?)).
CaseMap(NFKC(?)) !=
NFKC(CaseMap(?)) ???
Soobok Lee
|