[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] case preservation
----- Original Message -----
From: "Martin Duerst" <duerst@w3.org>
To: "Soobok Lee" <lsb@postel.co.kr>; "Soobok Lee" <lsb@postel.co.kr>; "Dan Ebert" <dan@enic.cc>
Cc: <idn@ops.ietf.org>
Sent: Thursday, October 18, 2001 6:32 PM
Subject: Re: [idn] case preservation
> At 16:13 01/10/18 +0900, Soobok Lee wrote:
>
> > > > There is indeed a non-zero (but very, very small) probability
> > > > for such cases. But if domain names are written in lower case
> > > > the way they mostly have been up to now, a word in a language
> > > > written in Cyrillic looking the same as a word in a language
> > > > written in Latin would be about as rare as a four-leaf clover.
> > > >
> >
> >No. mcuh more frequent than you guess.
> >
> >Cyrillic small 'a' 'e' 'o' 'c' 'p' 'x' 'y' 'i' 'j' 's' have the exactly
> >same look with latin small ones.
>
> Yes. But 'i' 'j' 's' are not actually used in most languages
> that are written in Cyrillic. And in all languages, most of
> the possible letter combinations are not actually used. And
> the longer a word is, the more quickly the probabilities
> approach zero.
>
the _SUM_ of the probability for every word length 1..big N may converge to
a certain non-zero value that should not be neglected. With 'HMTB',it will be much bigger than that.
it's well known that all 3 letter labels are registered in LDH.com,.net,.org.
every 3-letter cyrillic label from 'aeocpxy' collides with a 3-letter LDH.com .
"copy.com" "coca.com" "ec.com" "ace.com" "eco.com" "ocx.com" "oxy.com" "cap.com" .... :-))
Cyrillic 'i' 'j' 's' are not used in Russian. But, as you know,
'j' is used in serbian,azerbaijani language in cyrillic script.
and 'i' is ByeloRussian-Ukrainian 'I'.
Are these characters extinct or live?
Soobok Lee
>
> Regards, Martin.
>