[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] half-width history (was: NFC vs NFKC)
- To: idn@ops.ietf.org
- Subject: [idn] half-width history (was: NFC vs NFKC)
- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 23 Oct 2001 17:22:04 +0900
Dear IDN WG members,
This is a part of a message that I sent a while ago
to the nameprep design team about half-width Katakana
and full-width Latin. In the thread, somebody was makin
an analogy to upper/lower case, and how COBOL didn't
deal with the distinction, just to give you the context.
Regards, Martin.
>>>>>>>>
This is a completely inappropriate analogy. There are many very serious
differences between case and half-/full-width.
Upper/lower case is an important component of most writing systems
for languages using the relevant script. It is part of orthography,
and typography. Most people use the distinction also in handwriting.
It has been that way for about 500 years or so. Early computers had
very tight limitations, and ignored orthography and typography.
Fortunately, we are beyond this.
For half-width (kata)kana, the situation is completely reverse. It's
the equivalents of the 60's COBOL programmers that created them to
deal with the severe hardware limitations at that time. They are not
at all part of orthography or typography. They are not distinguished
in handwriting. They don't appear in any dictionary, or any serious
book or anything. Their use is very limited, in particular:
1) device limitations, e.g. the receipts printed by cash registers,
displays on fax machines and copiers,... But not on mobile phones,
which can use as many characters as Japanese PCs.
2) Sometimes in user interfaces on PCs where people try to be smart
and save space by using half-width kana. Because a kanji can often
replace several kana, and because in half-width kana, the diacritics
take extra width, the saving of space is often minimal, but the
decrease in legibility (using katakana instead of kanji and hiragana,
and the half-width fonts having very bad legibility on their own)
is often very drastic.
In addition, no Japanese input method produces half-width kana off
the shelf. The one I have has an option to prohibit half-width kana
alltogether. And Japanese email (using iso-2022-jp) isn't able to
transmit half-width kana (my Japanese version of Eudora converts
to full-width kana when sending). This significantly reduces any
cut/paste problems.
And out of 100 Japanese that you ask, about 100 will prefer decent
typography to half-width kana, as about 100 westerners will prefer
a decently set and printed page with case distinctions to a 60's
all-uppercase printout.
The situation is a bit (but not much) different for full-width Latin
letters. Their creation was an accident. For the predecessor of what
is now JIS X 0208, they were included because the designers of that
two-byte standard thought that it would be used (at least to some
extent) as a pure two-byte standard, and that of course Latin
letters were needed once in a while. What is now usually called
full-width variants was intended to be just a different encoding
of one and the same letter (think ebcdic).
As it turned out, the two-byte encoding was virtually always
combined with a 7-bit standard (ASCII or the Japanese 646 variant).
In addition, screencell-oriented software was adapted to Japanese
by preserving the invariant of one byte == one (half-width) screen
cell. This was then later adapted into so-called word processors,
and from there into current-day word processing software.
Again, neither decent typography nor orthography distinguish two
width variants of Latin letters. The general practice for
typesetting Latin letters is to use proportional fonts, and was
always like this where the tools allowed it. There is also no
distinction made in hand-writing.
There are some differences to half-width kana. Full-width Latin
can be sent in emails. Some input methods have initial settings
that use full-width for Latin (but all of them allow to set
preference to half-width Latin). Readability of full-width Latin
is better than for half-width kana, because it's not used as
a substitute for completely different-looking letters (kanji,
hiragana), and because the characters are larger, rather than
smaller, than their counterparts.
So both half-width kana and full-width Latin letters are products
(or accidents) of technologies of the 60's and 70's. They are not
at all established in typographic or orthographic practice, and
I guess they never will. Comparing their proposed exclusion from
nameprep with COBOL programmers from the 60's defending upper-case
only is completely backwards.
For nameprep, half-width kana are very much irrelevant, because
the user would have to go to extra lengths to input half-width kana.
We don't want to forbid anybody to fold them to full-width katakana,
but there is absolutely no need to include them in nameprep.
For full-width Latin letters, there is indeed a certain chance
that users input them e.g. in a location field in a browser.
In particular, this chance will be increasing when mixed
domain names (i.e. JJJJ.LLLL.JJJJ) are used. But this is the
same problem that we have for the dot, and can be dealt with
in the same way.
Regards, Martin.