[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] homograph attacks
I'm by no means a linguist but I would assume that there are a plethora
of good and usefull mixtures of scripts that exist in daily life. Passing
this problem (of which all of us have been aware of for years now) back
to the policy arena won't help anyone since I doubt that there can be any
kind working group (now or in the future) that can come up with a good
rational for all scripts and languages without restricting "good" and usefull
mixtures.
By design the IDNA processing happens inside the application and therefore in
my thinking the applications are the right place for any security meassures
as well. Talking about about security measures we have to think about what
exactly we want to prevent from happening. Do we in general want to battle
fraud than we would have to have a look at typo domain names like pajpal.com
as well or is the goal to enable the user to better understand exactly what URL
he is using. In that case we are not talking about security anymore but about
awareness. If a user is aware of the fact that a URL he wants to use is a
mixture of scripts he can decide for himself whether he wants to trust it or not.
I guess thats fair enough afterall are all users responsible for there own behavior
and the risks comming along with it.
Best,
tom
Am 15.02.2005 schrieb Michel Suignard:
> No languages used in the former soviet union should require a mix of latin and cyrillic in a single dns label.
> Unicode contains many latin homographs in the Cyrillic block exactly for that reason, to avoid mixing the two scripts in a single word. It is unfortunate that the exact visual match is now haunting us. However it should not be used as a rationale to accept registration of mixed Cyrillic/Latin labels by tld registries.
>
> To answer another message in this thread, there is no definitive answer about which Unicode characters are allowed for a given languages. But in all languages that have a reasonable concept of 'words', you should never need to allow mixed script in a word, at least in the context of IDN label. There are exceptions to these rules, like in South and East Asia (Japanese comes to mind), but these languages can be detected reasonably using the Unicode script property.
>
> Michel
>
> -----Original Message-----
> From: owner-idn@ops.ietf.org [mailto:owner-idn@ops.ietf.org] On Behalf Of Kane, Pat
>
> VeriSign does prevent domains with the Russian language tag from commingling A-Z with the Cyrillic characters. It does permit 0-9 and the dash to be used. This filter also applies to other Cyrillic based languages such as Belarusian, Ukrainian, Serbian, Macedonian and Bulgarian.
>
> There are other languages that are listed within ISO 639-2 that today use a combination of Latin and Cyrillic as they were originally Latin based (Tajik was Arabic prior to being Latin based), migrated to Cyrillic during the Soviet era and today are migrating back to Latin. It is common to use Latin and Cyrillic characters in Tajik, from what I understand not being a native speaker. Granted there are not a lot of registrations in com net that are Tajik, but this is just the point of an IDN.
>
> Pat Kane
>
>
>
>
Gruss,
tom
(__)
(OO)_____
(oo) /|\ A cow is not entirely full of
| |--/ | * milk some of it is hamburger!
w w w w