[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode/10646 History (was Re: [idn] An ignorant question about TC<-> SC)

To: "John C Klensin" <klensin@jck.com>
Subject: Re: Unicode/10646 History (was Re: [idn] An ignorant question about TC<-> SC)
From: "Mark Davis" <mark@macchiato.com>
Date: Thu, 1 Nov 2001 08:22:31 -0800
Cc: <idn@ops.ietf.org>
some comments below.
—————

Δός μοι ποῦ στῶ, καὶ κινῶ τὴν γῆν — Ἀρχιμήδης
[http://www.macchiato.com]

----- Original Message -----
From: "John C Klensin" <klensin@jck.com>
To: "Mark Davis" <mark@macchiato.com>
Cc: <idn@ops.ietf.org>
Sent: Thursday, November 01, 2001 00:35
Subject: Re: Unicode/10646 History (was Re: [idn] An ignorant question about
TC<-> SC)


> Mark,
>
> It probably isn't worth your time, my time, or especially that
> of the WG to go through this in detail.  The bottom line is that
> we would be stuck with Unicode if it were an act of complete
> beauty that had resolved all tradeoffs to everyone's
> satisfaction for all purposes, and we would be stuck with it if
> it got many of those tradeoffs wrong from the standpoint of
> IDN/DNS use (regardless of whether they were made optimally from
> other perspectives).   And, while both Eric and I might be
> faulted for not adopting a more positive and constructive tone
> in discussions on this subject, his experience and observations
> during the relevant period are consistent with mine and I won't
> spend time and bandwidth repeating his observations.
>
> That said, three comments that seem to need making:
>
> (i) The impression that the original Unicode effort was driven
> by printer vendors (and, as Eric suggests, printer-focused
> groups within companies with broader interests) came from two
> sources.  One was a series of statements from them to various
> other ISO and ANSI committees and liaison groups that made that
> quite clear.  The other was comments from standards
> representatives from several of those companies who, when asked
> about particular properties of Unicode, produced responses that
> might be summarized by "the printer guys made us do it".

Curious. I don't know which standards representatives made those statements.
It might well have been true for their companies, but I had no such
impression at the time.

>
> There are other seeming anachronisms in your version of the
> story (e.g., the original design base for 10646 was purely as a
> 32-bit character set, so a criticism on the basis of what fit
> into "the BMP" is a little strange -- while there were some
> early attempts to push 16-bit subsets (mostly from printer
> vendors, if I recall), unless my memory is failing severely, the
> concept of a "BMP" originated with the Unicode merger.

I went to occasional X3L2 meetings fairly early on, and I remember when the
US started to support a uniform 4 byte form. By the time of the DIS-1 there
were multiple byte formats, since there was some apprehension that industry
would not immediately quadruple memory requirements! So in DIS-1 -- which
was the version of 10646 that we were talking about -- there was indeed a
2-byte form, and it did have announcer sequences.

The term "BMP" was definitely an ISOism. The Unicode standard never had the
notion of planes at all. Only relatively recently has the term even been
introduced into the Unicode glossary (http://www.unicode.org/glossary/)

>
> (ii) At no point did I mean to imply that DIS-1 was a perfect
> solution (to anything).  As you point out, the effort had
> abandoned the "one code point, one glyph-width" principle.  As
> Jon points out, they had botched Hebrew and several other
> things. And those are two examples among many.  Could it have
> been "fixed"?  Perfectly, almost certainly not, although we
> would have ended up with a different set of tradeoffs and
> (intentional or not) optimizations (see below).  Was it closer
> to its original design criteria than the Unicode version that
> substituted for it?  Almost certainly (and not surprisingly)
> yes.  I did go to some lengths to suggest that UTC has done good
> work to fix some of the difficulties Unicode inherited from its
> early history -- it is significantly closer today to the balance
> of design goals originally set in TC97 for the UCS than it was
> when the substitution was made.
>
> (iii) The real points of my raising those historical issues were
> the one you seem to have missed, so let me assume I wasn't clear
> and say it explicitly.  As I hope most of the participants in
> IDN have long ago figured out, this business of trying to create
> a single "UCS" is one that involves many complex tradeoffs and
> for which there are often no easy answers.  If one of those
> tradeoffs is resolved in one way, some applications get easier
> and others get harder.  There is even a case to be made that the
> stated current design criteria for Unicode are not completely
> orthogonal, leading to even more tradeoffs.
>
> To give just a few examples,...
>
> * Keeping scripts together and preserving widely-used
> earlier CCSs as blocks is A Good Thing.    But having
> exactly one code point associated with a given glyph/
> letter shape is also A Good Thing. One can't have both.
>
> * Han unification provides many benefits.  Han
> unification also causes some problems (one of which, in
> the present instance, is that one appears to need
> metadata to map between TC and SC without doing nasty
> things to Kanji).  One cannot both do unification and
> not do unification.
>
> * There are several different ways to handle character
> presentation ordering (e.g., right-to-left/ left-to-right),
> especially in the edge cases in which right-to-left and
> left-to-right (or one of them and top-to-bottom) scripts are
> intermixed.  Similarly, there are multiple possible ways to
> handle optional vowels, word-breaking indicators, tone and
> stress markers, and so on.   In each case, different techniques
> are better for different circumstances and conditions; none is
> optimal for all cases.  No matter what one chooses, it will be
> suboptimal for some problems.  And, if one does not choose, but
> incorporates several of the possibilities, one will be subject
> to accusations of excessive complexity and too many options.
>
> In each case, and for many others, there are engineering
> tradeoffs involved.  And, for each case, second-guessing the
> decisions in the light of a particular application is a popular
> sport, but one that accomplishes very little -- I don't see
> alternatives to Unicode/ 10646 and the IETF is certainly not
> going to invent one.  But I believe it is equally important that
> all of us, even the members of the UTC, understand and remember
> that Unicode _is_, ultimately, a product of choices among
> engineering tradeoffs (which include its history).  And those
> choices impose other engineering constrants that we need to
> learn to live with and/or work around.  Unicode is not
> divinely-inspired Received Wisdom and we need to avoid thinking
> that assumes that it is.

You make some good points. Unicode is the product of many tradeoffs over the
years; and there are certainly a cases that I personally would have done
differently. I'm sorry if I somehow gave you the impression that I or anyone
else involved in Unicode ever thought that Unicode was "divinely-inspired
Received Wisdom" -- or that it had no flaws. What we do think, along with
you, is that despite what flaws it has, there are no alternatives to Unicode
/ 10646.

>
> regards,
>     john
>
> --On Wednesday, 31 October, 2001 12:00 -0800 Mark Davis
> <mark@macchiato.com> wrote:
>
> > The history of the Unicode/10646 merger is really irrelevant
> > to TC<->SC, but there are some inaccuracies that should not be
> > left hanging.
> >...
>
>
Prev by Date: Re: Unicode/10646 History (was Re: [idn] An ignorant question about TC<-> SC)
Next by Date: [idn] IDN rechartering rev 3
Prev by thread: Re: Unicode/10646 History (was Re: [idn] An ignorant question about TC<-> SC)
Next by thread: Re: Unicode/10646 History (was Re: [idn] An ignorant question about TC<-> SC)
Index(es):
- Date
- Thread