[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Editorial comments on stringprep

To: Doug Ewell <dewell@adelphia.net>
Subject: Re: [idn] Editorial comments on stringprep
From: Patrik Fältström <paf@cisco.com>
Date: Sun, 05 May 2002 10:02:27 +0200
Cc: Erik Nordmark <Erik.Nordmark@sun.com>, idn@ops.ietf.org, Mark Davis <mark@macchiato.com>
In-reply-to: <007f01c1f390$d2ca5480$fa424244@anhmca.adelphia.net>
References: <Roam.SIMC.2.0.6.1020499328.17007.nordmark@bebop.france><11503195.1020511069@localhost><007f01c1f390$d2ca5480$fa424244@anhmca.adelphia.net>

--On 2002-05-04 10.26 -0700 Doug Ewell <dewell@adelphia.net> wrote:

> I couldn't remember UTC ever saying such a thing, so when Mark Davis
> <mark@macchiato.com> wrote:

I said _at_least_before_version_2_.

In 1995 I had a long discussion with Unicode Consortium when I worked at
Bunyip Information System how to do normalization and case folding
together. Their response was to covert to lower case, and our
implementation of Whois++ named Digger also ended up in a paper which was
presented at a Unicode Conference around 1995-1996.

I also saw Mark only refering to case folding, not lower case perticularly,
and that's why I talk about handwaving, historical artifacts etc etc.

>> There are also a number of codepoints which are lowercase which
>> doesn't have uppercase versions.
> 
> Which ones?  I can think of a character that looks uppercase but has no
> lowercase form (U+04C0 CYRILLIC LETTER PALOCHKA).  But such letters,
> despite their appearance, are neither uppercase nor lowercase; they are
> caseless, and immune to the effects of any casing operation.

Quote from page 142 in Unicode version 3 book:

"Also, because many characters are really caseless (most of the IPA block,
for example), uppercasing a string does not mean that it will no longer
contain any lowercase letters."

I only quote the text.

Yes, when reading it, one might think it should have been written "...that
it will only contain uppercase letter." but it doesn't.

>> Last, some codepoints (like the german sharp-s, ß) turns to "SS" in
>> uppercase, and my guess is (with my limited knowledge of German,
>> only 2 years of studies) that one when comparing don't want that
>> similarities.
> 
> German speakers are forced to deal with that mapping every day.  It is a
> natural part of the language.

I know. I just gave it as one example where I _thought_ people from Germany
rather wanted lower case than upper case. I wait until I hear from someone
from Germany saying they prefer mapping to upper case before I say
something else.

We talk about what is preferred. Not wether people are used to.

>> And, personally, I rather see bq-asdqwe123 than BQ-ASDQWE the few
>> times I hope I see a domain name used in protocols natively in its
>> ACE encoding.
> 
> No argument there.  All-lowercase is widely recognized as being easier
> to read than all-uppercase, primarily because of the greater variation
> in letterforms.  But again, there doesn't seem to be any evidence that
> the Unicode Consortium has made any of the claimed statements about
> "preferring" lowercase or about the mapping to lowercase being more
> "consistent."  Please dig up the relevant references, if possible.

After some digging, I see the paper Philippe and I wrote was presented on
A5 on Unicode Conference number 9, on september 5, 1996:

   Text Searching Across Multiple Character Sets in Unicode
   Philippe Boucher, Senior Programmer,
   Bunyip Information Systems, Montreal, Quebec, Canada

I can not find the document though.

   paf

References:
- Re: [idn] Editorial comments on stringprep
  - From: Erik Nordmark <Erik.Nordmark@sun.com>
- Re: [idn] Editorial comments on stringprep
  - From: Patrik Fältström <paf@cisco.com>
- Re: [idn] Editorial comments on stringprep
  - From: "Doug Ewell" <dewell@adelphia.net>

Prev by Date: Re: [idn] 1st stringprep issue: not answered and ignored
Next by Date: Re: [idn] Editorial comments on idna
Previous by thread: Re: [idn] Editorial comments on stringprep
Next by thread: Re: [idn] Editorial comments on stringprep
Index(es):
- Date
- Thread