[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: I don't want to be facing 8-bit bugs in 2013

To: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>
Subject: Re: [idn] Re: I don't want to be facing 8-bit bugs in 2013
From: Sampo Syreeni <decoy@iki.fi>
Date: Wed, 20 Mar 2002 21:16:07 +0200 (EET)
Cc: <idn@ops.ietf.org>
In-reply-to: <200203201557.AAA02490@necom830.hpcl.titech.ac.jp>

On Thu, 21 Mar 2002, Masataka Ohta wrote:

>Unicode is not usable in international context.

Come, now. My understanding has been that that's precisely the context
where Unicode is supposed to be used. Why have Unicode in the first place
if not for multilingual text?

>There is no unicode implementaion work in international context.

How's that? I would tend to see current browser and office suite work as
such, at the very least.

>Unicode is usable in some local context.

If I understand correctly, you're opposed to Unification and the external
protocols for language tagging it necessitates if we insist on absolute
typographical correctness of text with unified ideographs. HTML and
XML/XHTML provide precisely such tagging, as do Unicode language tag
characters in the context of protocols with no external language
indication facilities. Also I seem to remember that East Asian Unicode
text is legible even when printed in a font not designed for the
particular "local context", as you put it. Where's the problem?

>There is some unicode implementaion work in local contexts.

Well, considering that UTF-8 is the encoding of choice for some past, much
present and all future IETF and W3C work, and that Microsoft's products
seem to be heading for UTF-16, I'd say that is a colossal understatement.

>However, the context information must be supplied out of band.

Not must, but rather must provided that the text is meant for human
consumption *and* exact typography of variant characters is a requirement.
Unicode was never meant to solve the latter part -- it does not encode
font information, for instance.

>And, the out of band information is equivalent to "charset" information,
>regardless of whether you call it "charset" or not.

Absolutely not. Unicode's characters are perfectly well defined even if
they are not correctly printed. What we're indicating here are differences
between languages and preferred renderings of a given piece of Unicode
text.

You should think about this in the context of rendering to speech, perhaps
that will help you see the fine distinctions involved. After all, one
cannot even *attempt* such renderings from text written in pure Latin-1
without external language indication. Why should graphic rendering be any
different?

>Fix is to supply context information out of band to specify which
>Unicode-based local character set to use.

No. The fix is to indicate the language the document is in, or perhaps
encode font information explicitly. Provided such meticulous attention to
the appearance of the text is warranted in a given application, anyway.

>See, for example, RFC1815.

Yes, you seem to have objected to Unicode, before. The trouble is, not a
whole lot of people agree with RFC1815. For a reason, I daresay.

>As for IDN, it can't just say "use charset of utf-7" or "use charset of
>utf-8".

Of course you can. UTF-8 and UTF-7 are bona fide character sets -- the
characters they define are unique and well-defined. You're confusing
renderings of characters with characters themselves, a common mistake with
Unicode. (But of course you already know that.)

>Anyway, with the fix, there is no reason to prefer Unicode-based local
>character sets, which is not widely used today, than existing local
>character sets already used world wide.

Of course there is -- my local character set cannot represent Arabic,
Japanese, English and Finnish, with correct punctuation and other
typographic pedantries, in the same document. Hell, it cannot do that even
in separate documents, unless I use Unicode. The latter on the other hand
works perfectly, with no loss of information.

Your line of thinking is what has lead to e.g. i-mode employing JIS or
Latin-1, and making the current incarnation of that technology useless to
Central European, Chinese, African, Indian, and a probably a whole lot of
other, user communities. Just think about what something like this would
do in an IDNA context, and you'll understand why Unicode is a Good Idea.

Besides, if you look at a Chinese user typing in the name of a Japanese
site, I would say unification makes the procedure considerably more
forgiving. Don't you?

I'd say local variants will full support are Bad, a unified coding with
local profiles is Good.

Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

References:
- [idn] Re: I don't want to be facing 8-bit bugs in 2013
  - From: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>

Prev by Date: [idn] Re: I don't want to be facing 8-bit bugs in 2013
Next by Date: [idn] Re: I don't want to be facing 8-bit bugs in 2013
Previous by thread: [idn] Re: I don't want to be facing 8-bit bugs in 2013
Next by thread: [idn] usability of Unicode
Index(es):
- Date
- Thread