While gathering usage frequency data for many
language scripts,
I classified them into 6 categories (see enclosed
):
Will all of these scripts be frequently
used in IDN ?
At least #3 archaic scripts and #4 won't and
for #1 and #2 , i am neutral yet.
I am working on #5.
I can find many similarities and ambiguities across these
language scripts.
Many Indian language scripts came from the same mother
language
'Brahmi' and have similarly-looking
scripts and numerals .
Soobok Lee
------------------------------------------------------------------------
#0 already in ML.com testbed (I got huge
usage frequency samples for these)
Hebrew:
Arabic:
Thai:
Hindi(Devanagari):
Cyrillic:
Greek:
Hiragana:
Katakana:
Hangul:
Han Ideograph:
#1 few native speakers Georgian: < 3.5 millions
Cherokee: 100,000 (North America) Thaana : 100,000 (Maldives) Armenian: 2 millions #2 two languages in one country:
bengali: Indian language ( > 180 millions )
gujarati: Indian language
gurmukhi: Indian language
kannada: Indian language malayalam: Indian language oriya: Indian language telugu: Indian language tibetan : chinese (mainland china) , < 5
milliions
yi syllables: chinese (mainland china), < 5 millions #3 Archaic
Ogham: Runic: Gothic: Syriac: only used for for liturgical purpose Unified Canadian ABoriginal Syllables: #4 Written only vertically
mongolian: (once had been abolished and restored) #5 others
sinhala : Sri lanka tamil : Sri lanka khmer : Camobodian lao : Laos, Thai ethiopic : Ethiopia myanmar : Burmese |