[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] How to match letters
We have now and then discussed what letters are to be matched as
equals. In the nameprep document some work is done and there is
no other draft available. I could write one for the Latin based
alphabets (this includes Greek and Russian), if that is wanted.
(I cannot write how to do matching for other parts as I have
no language knowledge about them).
There are two important things I have come across related to
matching names (I have worked a lot with LDAP/X.500 directories):
1) matching based on equivalent glyphs
2) matching based on equivalent sounds
An example of 1) is that in UCS, Latin upper case A, Greek upper case Alpha
and Cyrillic upper case A have the same glyph.
Using my Swedish keyboard I could enter names in Greek or Cyrillic, but
I do not have three equivalent looking "A"s on my keyboard. Instead I
would use the same A for all names.
>From this I think name matching must treat all equivalent looking
letters as the same - this also resulting in lower/upper case versions
of those letters treated as the same even if their glyph does not match.
Does anybody have any problems with this? Or is there some other way
to do it?
An example of 2) is Swedish "ö" (o with diearesis) and Danish
"ø" (or with stroke). They both represent the same vowal and somebody
in Denmark would often enter a Swedish name using the Danish version of
the letter, and the other way round from Sweden.
So the letters "ö" and "ø" should match as the same.
How many more of this kind is there? Any problems doing this kind
of matching?
Dan