[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Combining characters (was: Re: [idn] hostname historyhell)
Hi, Doug:
Due to large character set like UCS, mixed use of scripts
and look-alikes cases are so prominent, some sort
of classification is unavoidable.
Should it be
> Spanish or Italian? Should we care?
No, we don't care about these cases, since they
are Latin users. Latin has taken care of many different
spoken languages already, so do Cyrillic, Arabic and
Chinese.
On Mon, 26 Nov 2001 11:40:52 EST DougEwell2@cs.com writes:
> In a message dated 2001-11-26 0:31:52 Pacific Standard Time,
> liana.ydisg@juno.com writes:
>
> > Have you thought about " Mixed language URLs "
> > with language tags, for example:
> >
> > www.zh-china/mo-mogolia/zh-county/mybusiness.com
> >
> > shall be able to work?
>
> I thought one of the fundamental characteristics of domain names,
> host names,
> URLs, etc. is that they were identifiers, not true names, and hence
> they were
> not intended to be language-tagged.
>
> Just as an example, two popular search engines are teoma.com and
> altavista.com. What language is "Teoma"? Is "Alta Vista" supposed
> to be
> Spanish or Italian? Should we care?
>
> -Doug Ewell
> Fullerton, California
However, Mixed script is a lot more complex. For
example, Japanese uses Kanji, but it is not only phoneticaly
different from Chinese, its grammar is completely different
from Chinese. The difference is so great that we have to
reflect them separeatly in structured data too. My Chinese
address label in previous message is an example of
such a difference. My Mogolia and Chinese example is
another example of mixed used of structured labels, though
the Chinese group has not raise this in front of this group.
They have too much on their hands already :-(
The classification, language tags or script tags, must be
used sometime in URL to deal with these issues. I have
used "language tag" instead of "script tag", since
1) Different languages uses the same script, such as
CJK.
2) Language tag has been defined in [iso639], and
some of the issues have been solved already. For
example: Does Cantonese have a language tag or not?
3) From engineering point of view, IETF has a list of language
reqiurements to consider. That is, can we come up a
solution to cover these cases in DNS?
If in the future, down the line in IDN, someone challenge
us regarding diacrtic marks between French and Dutch,
for example, then we have to be able to say this case is
covered with a Latin tag. If someone wants more localized
features with a French tag, then we may question if such a
feature can be accomodated with existing methods or
not before we lunch into another tag.
I would suggest the following language tags to be
considered first:
CJK
Latin
Cyrillic
Arabic
Bengali
Greek
Although Greek has smaller group of native users, it is
familar to many Latin users, and can serve as a study
case in discussion.
Liana