[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] case folding
- To: "Brian W. Spolarich" <briansp@acm.org>
- Subject: Re: [idn] case folding
- From: RJ Atkinson <rja@inet.org>
- Date: Wed, 31 May 2000 09:40:55 +0100
- Cc: idn@ops.ietf.org
- Delivery-date: Wed, 31 May 2000 06:41:52 -0700
- Envelope-to: idn-data@psg.com
At 13:39 31-05-00 , Brian W. Spolarich wrote:
> What problem does case folding solve?
Preventing having functionally identical strings pointing at different
web content, to give an obvious example.
>Is it reasonable for protocol
>users to expect that MYDOMAIN.COM and MyDoMaIn.CoM are semantically the
>same, and therefore the protocol should understand that?
They already have demonstrated that they do, so the principle of least
astonishment means this behaviour should not change.
>While there is a
>backward compatibility requirement for US-ASCII, is it truly the case that
>users of the IDN will so strongly expect this behaviour that it becomes a
>requirement?
Absolutely yes.
>Is it possible to come up with a case-folding implementation
>that is going to satisfy the behavioural expectations of the large
>majority of the users? I am mostly ignorant of these issues as they apply
>to the the vast majority of languages, but given the issues that have been
>raised here, I have to wonder if this is practically achievable.
The key is to distinguish between alphabetic languages (e.g. English,
Norwegian, Vietnamese) and non-alphabetic languages (e.g. Chinese).
For alphabetic langugages, case folding needs to be handled appropriately,
while for non-alphabetic languages "case" is not generally meaningful.
For the previous example of German double-S, it isn't really a matter
of case-folding but would definitely be within scope for canonicalisation,
IMHO.
UNICODE already has a specification for canonicalisation, which specification
reportedly includes case folding. We can simply use that specification;
ISO not having an equivalent specification today (and not expected to have
one soon).
> One of the DNS' strengths is its relative simplicity for the complex
>distributed task that it accomplishes. Would the complexity and potential
>ambiguity involved in coming up with case mapping rules that meet
>everyone's expectations dimish the simplicity priciple that makes the DNS
>work well?
No. Failure to define case-mapping as part of canonicalisation would
definitely cause users to become frustrated and angry and work against
the continued health of the Internet.
Ran
rja@inet.org