[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Re: Back to work (Nameprep) (was: Re: Just send UTF-8 with nameprep (was: RE: [idn] Reality Check))
Hi,
----- Original Message -----
From: "Martin Duerst" <duerst@w3.org>
> >one idea (which I don't particularly like) is to assume that all characters
> >within a single label are from a single langauge, and if the same glyph
> >maps to different code points (indicating characters from differnet
languages)
> >then you resolve the ambiguity by using the code point that creates the
> >fewest number of language changes. I won't even begin to list the problems
> >with this; I mention it only because I think that this approximates the
> >behavior that is most natural for human beings.
>
> I think this is worth trying, in order to get rid of the famous 'A' for
> Latin, Greek, and Cyrillic. It's of course to be done on a per script
> base, not per language. I wouldn't actually resolve by tweaking
> codepoints (sometimes it will be very difficult to decide which
> codepoint to tweak), but just by rejecting strange combinations.
> You have to do a keyboard switch to get from one script to the other,
> so the chance of getting a mixture accidentally isn't great.
> Doing the check only on the registration side may also be a very
> good idea; that may allow us to start with very tight rules and
> expand them later (e.g. allow scripts separated by a hyphen,...).
> It would also help a lot to address some bidirectionality problems.
Good idea.
Now, let's think about another case of all-Greek "oo.com" and all-Latin
"oo.com":
Either of the two consists of scripts from only single character sets.
But the two still look very similiar. Do you have any good idea about this ?
Regards, Soobok Lee