[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Re: [JET-member 464] Re: Fw: Re: new members invitation
John,
"Beats me" is fine answer, as is looking critically at the stated, and
unstated, problem constraints.
If we were to consider Abenaki/Mikmaq/Maliseet/... (17th century French
script derived scripts) and other Indian languages (contemporary, even
modern Spanish and English derived scripts) character sets and "Roman",
we'd be looking at the following equivalence problem:
U+0070 "8" when in an alpha-string
U+0222 "8" LATIN CAPITAL LETTER OU
U+0223 "8" LATIN SMALL LETTER OU
U+0117,U+0125 "OU"
U+O117,U+0165 "Ou"
U+0157,U+0165 "ou"
U+0127 "W"
U+0167 "w"
This just for the "8". "TC" in this context would be Abenaki (etc) written
with niche-market IBM Selectric type sets, similar to Inuktitut, and "SC"
would be the same written on an unforgiving 101 key IBM PC keyboard, ACSII.
We call it "diacritically simplified" characters -- basically everything is
promoted to its nearest ASCII look-alike, and all or almost all diacriticals
are stripped.
I don't claim that "solving for Abenaki <or even Cree or Dine'>" is nearly
as important as the SC/TC problem.
Eric