[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Matching and comparison
- To: idn@ops.ietf.org
- Subject: Re: Matching and comparison
- From: Paul Hoffman / IMC <phoffman@imc.org>
- Date: Thu, 20 Jan 2000 09:47:35 -0800
- Delivery-date: Thu, 20 Jan 2000 09:48:02 -0800
- Envelope-to: idn-data@psg.com
At 05:47 PM 1/20/00 +0900, Martin J. Duerst wrote:
> > Unless we can show a need for case-insensitivity *in the
> > internationalized characters*, we shouldn't force it.
>
>The largest need, already discussed, is clearly that a lot of people
>don't want to have to register ibm/ibM/iBm/iBM/Ibm/IbM/IBm/IBM to
>make sure nobody else registers. And three-letter companies still
>have an easy job.
That will always be a problem, regardless of what we do with case
sensitivity. Using the same logic, he Dürst company would not only have to
register Dürst.com, it would have to register Dûrst.com, Dúrst.com,
Dùrst.com, Dûrst.com, and Dùrst.com, not to mention about a dozen more that
my Eudora MUA didn't want to type for me. And this is just the European
scripts; I think that Indic and Arabaic scripts would have very similar
problems.
We shouldn't pretend to fix the "too many similar names" problem by only
talking about capitalization.
>Telling people that in an URI, domain names are case-insensitive,
>but file names are/may be case-sensitive is already hard. Telling
>them that a name is case-insensitive it if is ASCII only, and case-
>sensitive otherwise would be a really hard job.
Indeed. Telling them about anything having to do with internationalization
will be.
>I think we can postpone the casing issue if we agree that there
>are no requirements in that area, i.e. if we think that we
>can live with any solution (case-folding or not). But that's
>not what you are saying, and that's not what I'm saying,
>so I suggest that we put the points we came up with
>(would like to be able to have the names in the appropriate
>casing, would prefer not to have a strange break between
>names containing only ASCII and others, would like to avoid
>exponentially growing registrations to cover equivalents).
I think it would be good for us to list some of the known trickiness of
similar-looking script issues. So far, we have casing and Latin vowels with
diacritics. Looking through my I believe we also have to list Latin
consonants with diacritics, bidirectional names, similar-looking
punctuation marks, Arabic joiners, Devangari dependant and independant
vowels (and conjunct formations, and half-forms...), and Tamil vowel
splitting. I probably missed about a dozen other tricky issues; Martin is
much more versed in these things than I am.
--Paul Hoffman, Director
--Internet Mail Consortium