[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Indian scripts and similarities



This is indeed an interesting and potentially complex problem.
BTW, I am from India. Here is my take on it.
The North Indian languages (Hindi, Gujarati, Punjabi (using the Gurmukhi
script), Bengali etc.) are similar in structure and have been derived from
and closely related to the Devanagari script. Similarly the South Indian
languages have similar structure having been derived from another heritage
though most use Sanskrit words also (written in the respective script, not
in Devanagari).

Building a conversion table between the languages, in my opinion, will not
be a good solution since it will make a large number of URLs not usable. If
a person in Gujarat gets a URL for his business, it effectively will prevent
that URL in any of the other languages to be used. This could be tens of
languages for the Indian scenario.

I think that for the Indian languages the straight unicode based scheme
should be retained. Consider the following scenarios:

1) A guy in types namaskar.com in Bengali. He expects to either go to a
bengali site or be told that no such (bengali) site exists. Suppose there
was a gujarati namaskar.com site and we had implemented equivalence, and
took the bengali guy to the gujarati site, it would be completely
disorienting to him.

2) When the guy is registering, he will hopefully be registering with a
wysiwig input method (one of which is made by my company, Langoo.com), where
he will see the in-language characters. Therefore, domain registration
should be unambiguous as it relates to the Indian scripts.

If a URL was in a written form, as in an e-mail, there would be no problem.
The script will tell what language the url is in. If the URL was conveyed in
the spoken form then the ambiguity could exist. Again, if both the speaker
and the listener were of similar cultural and language background, they
would guess pretty accurately what was meant. One scenario where ambiguity
will exist will be for people who know more than one Indian language and
cannot tell by context which language url was in.  For example, if I called
someone on the phone and said, "Check out namaskar.com"  (namaskar is the
word for "greetings" in most of India) and my recepient knew both Gujarati
and Hindi. Since he knows me to be a Hindi speaker, he might think I meant
the Hindi URL. Since he is Gujarati, he might also think that I meant the
Gujarati URL.

Though this is possible, it will not be that frequent. In some sense, it is
similar to the English homonym problems where more than one word has the
same pronunciation and one has to disambiguate by context or by trial. In my
opinion, in India though the languages have similarity, they are spoken by
different and disjoint groups of people. The cultural background of these
groups is also quite different. Chinese speakers probably share more common
cultural background and have more potential for simplified/traditional
confusion.

Deven


-----Original Message-----
From: owner-idn@ops.ietf.org [mailto:owner-idn@ops.ietf.org]On Behalf Of
Soobok Lee
Sent: Saturday, July 21, 2001 8:00 PM
To: idn@ops.ietf.org
Subject: [idn] Indian scripts and similarities


Hi,

Recently, I found many similarities between Indian language scripts.

  Go to http://www.aczone.com/itrans/tblall/tblall.html
  and click 'Vowel', 'Consonant' and 'Digits' links and look into
  the script comparison tables.

Compare these pairs of scripts:
  1.  Devanagari vs Bengali
  2.  Gujarati  vs Gurmukhi
  3.  Kannada vs Telugu

If indian business names in an Indian state can be expressed
both in its official language scripts
and in its second language scripts (maybe the official one of another
state),
there may be name collisions across Indian states.

Do we need equivalence-conversion tables among Indian language scripts
like that of Traditional/Simplified chinese conversion table
proposed by CNNIC?

Regards,

Soobok Lee