[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Combining characters (was: Re: [idn] hostname historyhell)



Hi, Doug:

Due to many different use of scripts, and look-alike symbols,
some sort of classification is unavoidable. 

For Latin, you are correct: 

> Spanish or Italian?  Should we care?

No, we don't care, because Latin script has taken 
care of the some of the spoken language difference for 
us already.  So does Arabic, Cyrillic and Chinese scripts.  

However, the problem is a lot more complex for those 
mixed scripts, such as Japanese and Vietnamese.
Japanese is not only phonetically different from Chinese,
it is grammatically completely different from Chinese too.
Even we are dealing with structured data, the difference 
so great, that we have to consider to treat them differently, 
thus the use of "language tags" or "script tags".  
The Mogolian and Han mixed use is only another 
example for such a case, which the Chinese group 
has not raised before this group - They have too much 
on their hands already :-(

I have used "language tag" instead of "script tag", because
1) different languages using the same script, as the CJK 
cases.  2) "language tag" has been in [ISO639] already, 
we don't need argue if Cantonese needs a tag or not. 
That is an issue has been solved by [ISO639]. 3) We 
can use the tags already defined, but IETF doesn't need
to implement every language tag defined in [ISO639],
it is up to engineering consideration to cover all the cases
well enough to facilitate communications on DNS.

If this tag issue is raised in IDN down the line, for 
example, regarding different  uses of diacritic marks 
between French and Dutch, then it is a challenge to the 
design of DNS tag coverage we are doing now, 
since we should have taken care of this issue among 
Latin users when DNS tag coverage is discussed.

So for language tags we have to solve at this stage,
I would suggest:

CJK
Latin
Cyrillic
Arabic
Bengali
Greek

Although Greek does not necessarily cover a lot native 
users, it is familar to many Latin users, and serve a 
good case study for discussion. 

Liana


On Mon, 26 Nov 2001 11:40:52 EST DougEwell2@cs.com writes:
> In a message dated 2001-11-26 0:31:52 Pacific Standard Time, 
> liana.ydisg@juno.com writes:
> 
> > Have you thought about " Mixed language URLs "
> > with language tags, for example:
> >
> > www.zh-china/mo-mogolia/zh-county/mybusiness.com
> >
> > shall be able to work?
> 
> I thought one of the fundamental characteristics of domain names, 
> host names, 
> URLs, etc. is that they were identifiers, not true names, and hence 
> they were 
> not intended to be language-tagged.
> 
> Just as an example, two popular search engines are teoma.com and 
> altavista.com.  What language is "Teoma"?  Is "Alta Vista" supposed 
> to be 
> Spanish or Italian?  Should we care?
> 
> -Doug Ewell
>  Fullerton, California