[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] automagic case-insensitive comparisons in the DNS server
- To: "Adam M. Costello" <amc@cs.berkeley.edu>
- Subject: Re: [idn] automagic case-insensitive comparisons in the DNS server
- From: James Seng <James@Seng.cc>
- Date: Tue, 19 Sep 2000 15:46:12 +0800
- Cc: idn working group <idn@ops.ietf.org>
- Delivery-date: Tue, 19 Sep 2000 00:49:07 -0700
- Envelope-to: idn-data@psg.com
The presumption that comparison of I18N names only have 1-1 case mapping.
But that is not all we do to make comparison because there is normalization
form and canonicalization. And if you read my I-D on CJK, you will notice that
for Han, it is N-1 canonicalization, if we were to do anything there.
Nevertheless, let me throw another wacky idea based on yours. Use base-26 (yep
yep using A-Z), then choose upper or lower case A-Z depending whether our
pointer is on a 'upper or lower case' character. Hence, you know if the scheme
encoded to FoO.baR.com and foO.BaR.com, they are actually 'same case' :-)
But write an I-D anyhow. We all love I-D. :-)
-James "jetlag at 4am. warning. brain maybe non-functional" Seng
"Adam M. Costello" wrote:
> I just had a wacky idea relevant to the schemes that store ACE at
> the DNS servers. It's possible to encode the case of the Unicode
> characters as the case of the ACE characters. Since DNS servers
> already do case-insensitive comparisons of ASCII host names, they would
> automagically be doing case-insensitive comparisons of international
> host names.
>
> Here's how it could work: Instead of base-32, the ACE uses base-24: 8
> letters for 0000 through 0111 and 16 letters for 10000 through 11111,
> for an average of 4.5 bits per character instead of 5. (Notice that
> there is still exactly one way to encode an arbitrary bit string.) The
> Unicode characters are all converted to lower case (or all upper case),
> but the intended case of each Unicode character is indicated by the case
> of whichever ACE character contains its first bit (or last bit).
>
> This doesn't help with other kinds of insensitivities you might want in
> the comparisons, so it might be better to go with a general solution,
> but I thought this was cute and worth throwing out there.
>
> AMC