[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Layer 2 and "idn identities" (was: Re: [idn] whatare the IDN identifiers?)
--On Friday, 30 November, 2001 13:21 -0600 liana Ye
<liana.ydisg@juno.com> wrote:
> Thanks John for you are addressing what I am looking
> for. And I do think bottom up discussion hitting the
> wall after the introducing TC/SC, because it is a part of the
> CJK problem that JET has been facing. It is a good time to
> look at top-down wise. And we will hit CJK problem
> too as codepoint usage conflicts.
But there is very little top-down where the DNS itself is
concerned. Despite upward-facing uses, it is fundamentally
downward-facing, and that is, in different language, what the
"identifier" discussion is all about.
> So I think CJK code points usage conflicts should be
> resolved before this group can discuss solutions effectively.
I think that statement is equivalent to "the IDN WG should never
finish its work, or at least should take years to do so". With
the understanding that I don't read Han characters, I think
there have been ample illustrations over the last several months
that completely resolving "CJK code point(s) usage conflicts"
will require that the Japanese and Korean languages be reformed,
possibly to use a different character set base entirely.
> Thus the introduction of the term "equivalent character set"
> is to capture the broader "case folding" or normalization.
> For example:
> One Latin case from [nameprep]:
> 0048; 0068; Case map
>...
> And Chinese TC/SC example:
> <wind> has four code points in Chinese:
> TC, SC, TC radical, SC radical.
>...
Liana, this is just not helpful. I wrote a fairly careful
explanation of why a particular approach would not work and what
at least one alternative was. You apparently ignored that
discussion (I'm not even sure that you read it) and instead
appear to just be repeating what you have been saying all along.
And trying to create terminology confusion doesn't help either:
suppose we see an animal with four legs that gives milk and
makes sounds transcribed in some languages as "moo", and I say
to you "that is a duck". That might conceivably result in some
confusion on your part, but it isn't going to cause the animal
to start quacking or to take wing and fly away. Case-folding is
a well-defined operation for scripts associated with languages
with cases and that don't use diacritical marks. It is somewhat
less well-defined for scripts associated with languages that
have cases but that do use diacritical marks. It is
meaningless for languages without case distinctions.
It is completely appropriate for you (or others) to say "if it
is reasonable to do case-folding for scripts with cases, then it
is reasonable to map TC<->SC". It is possible to debate the
reasonableness of that statement and the feasibility of doing
the mapping (that debate has been going on for some time now).
But the statement itself is appropriate to make.
By contrast, when you try to equate case-folding with TC<->SC
mapping, you are either making a statement that is false or you
are playing language games in the hope of convincing people of
something analogous to cows being ducks. Please stop it; it
doesn't help, and it lowers your credibility.
I deliberately haven't used "normalization" in the discussion
above, since I'm not sure I know what that word means or, more
specifically, since I think it can be stretched to cover enough
meanings to not provide a useful definition.
> I believe there are other code points of <wind> have
> been allowed in UCS. Isn't it a time to use the
> "equivalent character set" term? Or may we use
> "equivalent codepoints" to stay away from the lanugage
> connotation?
I don't think this is helpful either, at least unless you want
to associate distance functions to the concept of "equivalence".
That, of course, is another version of the problem: the DNS is
binary -- things either match on a codepoint by codepoint basis
or they do not -- while distance functions between (StringI,
LanguageI, CountryI) and (StringJ, LanguageJ, CountryJ) are what
is needed here. But, had you read my previous notes, you would
presumably understand that already.
> If the "equivalent character set" is not a good term to group
> this codepoint normalization problem, I'd like to hear it.
> No feed-back means agreeable to the term.
Please do not, ever, send a note to an IETF mailing list that
buries an "if you don't respond, you agree with me" statement at
the end of a long note (or in the middle of one). We just don't
work that way around here and, again, trying it damages your
credibility.
john