[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: language scripts classified

To: "Soobok Lee" <lsb@postel.co.kr>
Subject: Re: [idn] Re: language scripts classified
From: Eric Brunner-Williams in Portland Maine <brunner@nic-naa.net>
Date: Thu, 19 Jul 2001 00:37:23 -0400
cc: idn@ops.ietf.org, brunner@nic-naa.net, quaynor@ghana.com

Soobok,

I still haven't any idea as to the utility or purpose of your scheme, but
I don't have to understand your purpose to attempt an answer to your set
of questions.

UCAS (Unified Canadian Aboriginal Syllabics) entered ISO10646 via the work
of the Canadian Aboriginal Syllabics Encoding Committee, chaired by Dirk
Vermeulen.

It incorporates the syllabic characters of Eastern, Western and Bible Cree,
Naskapi, Siksika (Algonquin), Athabascan, Carrier, Sayisi, Slavey (Dene),
and Inuktitut and Nunavut (Inuit), three language families, so character
frequency data "for UCAS" would be an average, comperable to one for Finnish
and Turkish, Icelandic, English and French, and Romaji-form Japanesse.

As with some other scripts now in ISO10646 via the Unicode Technical Committee,
UCAS was "unified" by people who's primary interest was glyph collection and
who view language, collation, distribution, etc., as properties of glyphs, who
managed to fail to impress upon the beneficiaries of their technical largess
that perhaps not all the world is a printer.

The current ISO10646 ordering approximates row-centric (E, I, O, A) vowel
ordering, marred by "Han (like) Unification", which manages to compress a
set of under 1K possible code-points into 600 or so (range U+1400 to U+1676),
a rather modest win. Here are the first "row", keep in mind that Y-CREE is as
like CARRIER as Japanese is like English.

		U+1401 E
		U+1401 AAI
		U+1402 I
		U+1404 II
		U+1405 O
		U+1406 OO
		U+1407 Y-CREE OO
		U+1408 CARRIER EE
		U+1409 CARRIER I
		U+1410 A

All the Algonquin languages I'm familiar with (W-Cree, Y-Cree, Siksika) teach
column ordering in schools ("Wa Pa Ta Ka ...") columnar consonant-ordering,
though both ordering exist in Nunavut (Inuktitut) usages. I can't shed any
light on Dene use though I suppose I'll have to ask around eventually.

That should motivate why writing collation for languages using UCAS for
code-points is difficult, even a barrier to the expansion of text-based
computer mediated language applications for about half of the Indians in
Canada.

The population question is hard to answer, as lots of people, Canadian and
Indian, arrive at different answers, for policy reasons. Canadians want to
diminsh the number of "Indians", Treaty Indians want to control the number
of C-31 "Indians", and Constitutional "Indians" may differ from everyone.
The "counting" situation is no better in the US, where differing "blood
quanta" and lineal descent rules exist, in a hodgepodge of Federal, State,
and formerly Terminated Tribes rules. It isn't a useful question to attempt
either, utilitarianism in this area is not polite, neh?

I hope this has been helpful, there are several dozen characters in the
1400 - 1676 range that are nameprep suspects for the usual reasons e.g.,
U+1429 FINAL PLUS looks like "+" U+002b, etc.

Do you really need character frequency data for UCAS for reordering? Just
how much of a win would halving the effective repetoire present to an ACE
designer? I'm pretty ignorant about ACE design issues. I'm afraid that my
text archives are a) in print, and b) W-Cree and Siksika centric, and not
representative of all the syllabics language communities.

I hope you don't mind my responding via the IDN list, and cc'ing Dr. Nii
Quaynor, who is working on a syllabary for several Central African languages,
and may appreciate some exposure to non-glyph issues he may not already be
aware of.

Eric

Prev by Date: Re: [idn] Debunking the ACE myth
Next by Date: Re: [idn] Debunking the ACE myth
Prev by thread: Re: [idn] Re: language scripts classified
Next by thread: Re: [idn] Re: language scripts classified
Index(es):
- Date
- Thread