[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Re: Back to work (Nameprep) (was: Re: Just send UTF-8 with nameprep (was: RE: [idn] Reality Check))
I think the whole notion of trying to prevent cross-script confusions in
domain names is a morass. It would invariably result in complicated rules
that would invariably have false positives and negatives; it also depends
greatly upon the font in use on the particular user's machine. We even have
that now, with one and ell -- depending on the user's font, those can look
identical.
Better would be to have useful GUIs that detect and signal possibly
confusing names. For example, in the URL field of a browser, the
spelling-check-style wavy underline could be used under terms like
"intеl.com" vs "intel.com" (where in the first the "e" is the Cyrillic
letter U+0435), to alert the user that the URL might be odd. Such tools
would not get in the way of legitimate domain names that mix scripts or
symbols.
Mark
—————
πάντων μέτρον ἄνθρωπος — Πρωταγόρας
[http://www.macchiato.com]
----- Original Message -----
From: "Soobok Lee" <lsb@postel.co.kr>
To: "Keith Moore" <moore@cs.utk.edu>; "Martin Duerst" <duerst@w3.org>
Cc: <idn@ops.ietf.org>
Sent: Wednesday, July 18, 2001 01:28
Subject: Re: [idn] Re: Back to work (Nameprep) (was: Re: Just send UTF-8
with nameprep (was: RE: [idn] Reality Check))
> Hi,
> ----- Original Message -----
> From: "Martin Duerst" <duerst@w3.org>
> > >one idea (which I don't particularly like) is to assume that all
characters
> > >within a single label are from a single langauge, and if the same glyph
> > >maps to different code points (indicating characters from differnet
> languages)
> > >then you resolve the ambiguity by using the code point that creates the
> > >fewest number of language changes. I won't even begin to list the
problems
> > >with this; I mention it only because I think that this approximates the
> > >behavior that is most natural for human beings.
> >
> > I think this is worth trying, in order to get rid of the famous 'A' for
> > Latin, Greek, and Cyrillic. It's of course to be done on a per script
> > base, not per language. I wouldn't actually resolve by tweaking
> > codepoints (sometimes it will be very difficult to decide which
> > codepoint to tweak), but just by rejecting strange combinations.
> > You have to do a keyboard switch to get from one script to the other,
> > so the chance of getting a mixture accidentally isn't great.
> > Doing the check only on the registration side may also be a very
> > good idea; that may allow us to start with very tight rules and
> > expand them later (e.g. allow scripts separated by a hyphen,...).
> > It would also help a lot to address some bidirectionality problems.
>
> Good idea.
>
> Now, let's think about another case of all-Greek "oo.com" and all-Latin
> "oo.com":
> Either of the two consists of scripts from only single character sets.
> But the two still look very similiar. Do you have any good idea about this
?
>
> Regards, Soobok Lee
>
>
>
>