[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Layer 2 and "idn identities" (was: Re: [idn] what are the IDN identifiers?)




On Fri, 30 Nov 2001 17:54:37 -0500 John C Klensin <klensin@jck.com>
writes:
> --On Friday, 30 November, 2001 13:21 -0600 liana Ye
> <liana.ydisg@juno.com> wrote:
> 
> > Thanks John for you are addressing what I am looking
> > for.   And I do think bottom up discussion hitting the 
> > wall after the introducing TC/SC, because it is a part of  the
> > CJK  problem that JET has been facing.   It is a good time to
> > look  at top-down wise.  And we will hit CJK problem
> > too as codepoint usage conflicts. 
> 
> But there is very little top-down where the DNS itself is
> concerned.  Despite upward-facing uses, it is fundamentally
> downward-facing, and that is, in different language, what the
> "identifier" discussion is all about. 
> 
> > So I think CJK code points usage conflicts should be 
> > resolved before this group can discuss solutions effectively.
> 
> I think that statement is equivalent to "the IDN WG should never
> finish its work, or at least should take years to do so". With
> the understanding that I don't read Han characters, I think
> there have been ample illustrations over the last several months
> that completely resolving "CJK code point(s) usage conflicts"
> will require that the Japanese and Korean languages be reformed,
> possibly to use a different character set base entirely.
>

That is the reason for me to ask "how different" the three 
languages are in using these codepoints?  Can we come up 
another way to look at these code points?  Since they have been 
through visual classificantion by Unicode Consortium already, 
should we pick another measure to look at them? 

Characters have four distincted espects: visual, phonetics, 
composition and semantics. The visual aspects of a character 
has been scrutinized by Unicode Consortium already.  The 
composition aspect of them is a subject for a long discussion
good enough for a book.

The common measure is along semantic similarity.  But how precise
can we classify them, synonyms like what is in a dictionary, used 
as keywords?  This is the layer 3 approach, it is may be effective
in free text, but as a structured domain names, different codepoints 
of a synonym set means different entities.  They can not be mixed 
or to reduce a search space needed as IDN identifiers.  

The next option along semantic classificantion is TC/SC
based codepoints, with demands from Chinese group.   To say 
they are the font difference is not the whole picture, but I have been
using it to STRESS they are "semantically equivalent" among 
Chinese, and serve a good purpose for internationization of 
domain names, and help to reduce trademark conflicts on the 
net.   I am trying to divide the problem up into separeted parts, 
for examination of the problem.   

1) Are these symbols semanticly equivalent among C,J,K?
2) If they are, we can do 

> > For example: 
> > One Latin case from [nameprep]:
> > 0048; 0068; Case map
> >...
> > And Chinese TC/SC example:
> > <wind> has four code points in Chinese:
> > TC, SC, TC radical, SC radical.
> >...

3) If they are not equivalent, we should record their  
language context and handle them in a similar way 
with Bengali, Tibetan ...

I do not see what is so wrong with such a request to 
Unicode group.  Afterall the JET has been gone over 
this many times, can I have a look at what are these 
codepoints with an Acrobat Reader?

I don't feel like to get into more aspects of a character 
at this time, and raise more discussion points.  But since 
I have started the four aspects of a character, I will mention
the last aspect of a character is its phonetic attribute.  
If we can use it for character encoding, then we will have 
phonetic ACE at the end.

Liana