[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Combining characters (was: Re: [idn] hostname historyhell)
Hi, Doug:
Due to many different use of scripts, and look-alike symbols,
some sort of classification is unavoidable.
For Latin, you are correct:
> Spanish or Italian? Should we care?
No, we don't care, because Latin script has taken
care of the some of the spoken language difference for
us already. So does Arabic, Cyrillic and Chinese scripts.
However, the problem is a lot more complex for those
mixed scripts, such as Japanese and Vietnamese.
Japanese is not only phonetically different from Chinese,
it is grammatically completely different from Chinese too.
Even we are dealing with structured data, the difference
so great, that we have to consider to treat them differently,
thus the use of "language tags" or "script tags".
The Mogolian and Han mixed use is only another
example for such a case, which the Chinese group
has not raised before this group - They have too much
on their hands already :-(
I have used "language tag" instead of "script tag", because
1) different languages using the same script, as the CJK
cases. 2) "language tag" has been in [ISO639] already,
we don't need argue if Cantonese needs a tag or not.
That is an issue has been solved by [ISO639]. 3) We
can use the tags already defined, but IETF doesn't need
to implement every language tag defined in [ISO639],
it is up to engineering consideration to cover all the cases
well enough to facilitate communications on DNS.
If this tag issue is raised in IDN down the line, for
example, regarding different uses of diacritic marks
between French and Dutch, then it is a challenge to the
design of DNS tag coverage we are doing now,
since we should have taken care of this issue among
Latin users when DNS tag coverage is discussed.
So for language tags we have to solve at this stage,
I would suggest:
CJK
Latin
Cyrillic
Arabic
Bengali
Greek
Although Greek does not necessarily cover a lot native
users, it is familar to many Latin users, and serve a
good case study for discussion.
Liana
On Mon, 26 Nov 2001 11:40:52 EST DougEwell2@cs.com writes:
> In a message dated 2001-11-26 0:31:52 Pacific Standard Time,
> liana.ydisg@juno.com writes:
>
> > Have you thought about " Mixed language URLs "
> > with language tags, for example:
> >
> > www.zh-china/mo-mogolia/zh-county/mybusiness.com
> >
> > shall be able to work?
>
> I thought one of the fundamental characteristics of domain names,
> host names,
> URLs, etc. is that they were identifiers, not true names, and hence
> they were
> not intended to be language-tagged.
>
> Just as an example, two popular search engines are teoma.com and
> altavista.com. What language is "Teoma"? Is "Alta Vista" supposed
> to be
> Spanish or Italian? Should we care?
>
> -Doug Ewell
> Fullerton, California