[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Editorial comments on stringprep
--On 2002-05-04 10.26 -0700 Doug Ewell <dewell@adelphia.net> wrote:
> I couldn't remember UTC ever saying such a thing, so when Mark Davis
> <mark@macchiato.com> wrote:
I said _at_least_before_version_2_.
In 1995 I had a long discussion with Unicode Consortium when I worked at
Bunyip Information System how to do normalization and case folding
together. Their response was to covert to lower case, and our
implementation of Whois++ named Digger also ended up in a paper which was
presented at a Unicode Conference around 1995-1996.
I also saw Mark only refering to case folding, not lower case perticularly,
and that's why I talk about handwaving, historical artifacts etc etc.
>> There are also a number of codepoints which are lowercase which
>> doesn't have uppercase versions.
>
> Which ones? I can think of a character that looks uppercase but has no
> lowercase form (U+04C0 CYRILLIC LETTER PALOCHKA). But such letters,
> despite their appearance, are neither uppercase nor lowercase; they are
> caseless, and immune to the effects of any casing operation.
Quote from page 142 in Unicode version 3 book:
"Also, because many characters are really caseless (most of the IPA block,
for example), uppercasing a string does not mean that it will no longer
contain any lowercase letters."
I only quote the text.
Yes, when reading it, one might think it should have been written "...that
it will only contain uppercase letter." but it doesn't.
>> Last, some codepoints (like the german sharp-s, ß) turns to "SS" in
>> uppercase, and my guess is (with my limited knowledge of German,
>> only 2 years of studies) that one when comparing don't want that
>> similarities.
>
> German speakers are forced to deal with that mapping every day. It is a
> natural part of the language.
I know. I just gave it as one example where I _thought_ people from Germany
rather wanted lower case than upper case. I wait until I hear from someone
from Germany saying they prefer mapping to upper case before I say
something else.
We talk about what is preferred. Not wether people are used to.
>> And, personally, I rather see bq-asdqwe123 than BQ-ASDQWE the few
>> times I hope I see a domain name used in protocols natively in its
>> ACE encoding.
>
> No argument there. All-lowercase is widely recognized as being easier
> to read than all-uppercase, primarily because of the greater variation
> in letterforms. But again, there doesn't seem to be any evidence that
> the Unicode Consortium has made any of the claimed statements about
> "preferring" lowercase or about the mapping to lowercase being more
> "consistent." Please dig up the relevant references, if possible.
After some digging, I see the paper Philippe and I wrote was presented on
A5 on Unicode Conference number 9, on september 5, 1996:
Text Searching Across Multiple Character Sets in Unicode
Philippe Boucher, Senior Programmer,
Bunyip Information Systems, Montreal, Quebec, Canada
I can not find the document though.
paf