[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Layer 2 and "idn identities" (was: Re: [idn] what are the IDN identifiers?)
Thanks John for you are addressing what I am looking
for. And I do think bottom up discussion hitting the
wall after the introducing TC/SC, because it is a part of the CJK
problem that JET has been facing. It is a good time to look
at top-down wise. And we will hit CJK problem
too as codepoint usage conflicts.
So I think CJK code points usage conflicts should be
resolved before this group can discuss solutions effectively.
Thus the introduction of the term "equivalent character set"
is to capture the broader "case folding" or normalization.
For example:
One Latin case from [nameprep]:
0048; 0068; Case map
210B; 0068; Additional folding
210C; 0068; Additional folding
210D; 0068; Additional folding
1D407; 0068; Additional folding
1D43B; 0068; Additional folding
1D46F; 0068; Additional folding
1D4D7; 0068; Additional folding
1D573; 0068; Additional folding
And Chinese TC/SC example:
<wind> has four code points in Chinese:
TC, SC, TC radical, SC radical.
I believe there are other code points of <wind> have
been allowed in UCS. Isn't it a time to use the
"equivalent character set" term? Or may we use
"equivalent codepoints" to stay away from the lanugage
connotation?
If the "equivalent character set" is not a good term to group
this codepoint normalization problem, I'd like to hear it.
No feed-back means agreeable to the term.
Liana
On Fri, 30 Nov 2001 06:42:41 -0500 John C Klensin <klensin@jck.com>
writes:
> --On Wednesday, 28 November, 2001 21:53 -0600 liana Ye
> <liana.ydisg@juno.com> wrote:
>
> >...
> > In one word, I am arguing for John's layer
> > two stuff to be dealt with in his layer one, is this
> > multilingual? I think this is the difference we have
> > started from beginning, from allowing TC/SC in
> > [nameprep]. This is nothing to do with "multilingual".
>
> But, Liana, these sorts of things are precisely why there is a
> layer 2, so I'm not sure I understand what you are suggesting.
>
> To go through that once again:
>
> (i) In the general case, we cannot handle TC-SC, or any other
> language-sensitive, or context-sensitive, translations or
> matchings in the DNS. A number of special cases and/or subsets
> can be made to work. But special cases and subsets are very
> confusing to users who expect the generalizations if anything
> appears to work. And, without a clear demonstration that there
> is a solution for the [more] general case on the near horizon,
> it would be irresponsible for the IETF to approve a "this
> handles the easy case, we will deal with hard cases later" style
> of solution -- that sort of approach is almost always bad
> systems design, and demonstrably so here.
>
> (ii) Language tagging in the DNS in some form would help a bit,
> but doing such tagging has gotten, as far as I can tell, not
> even a sign of consensus in the WG. It is reasonable that it
> has not, if only because nothing the WG has done or agreed to
> would change the fundamental character-by-character nature of
> the DNS or the assumption that the characters of a label can be
> chosen from any valid character from the selected subset of the
> relevant CCS, in any order (with the exception of the legacy
> rules about hyphen-minus).
>
> One way to look at the above is that the DNS just doesn't have
> enough information available during matching. The matching
> algorithms don't have access to language information, country
> information, or other things than could be used to sort out
> similarities and variants. And the DNS does exact matches -- no
> ambiguities permitted. If the needed information isn't there,
> no matching tricks or "preparation" is going to help -- there is
> no place in the DNS or either magic of "do what I mean"
> capabilities either.
>
> One can, again, get partially around those problems with
> client-side code --nameprep is precisely such a partial solution
> (or a potentially-complete solution to a narrower problem). But
> complex client-side solutions have a history of causing
> interoperability problems and user confusion. Client software
> suppliers find it just too tempting to provide codes that are
> "just a little better" than required by the standard. In this
> case, that means that user-supplied names will be interpreted
> differently depending on which client software package is used,
> and that is very bad news. Worse, interoperability testing in
> the usual IETF sense is impossible in these cases, since any
> damage has been done before a single bit crosses the wire.
>
> Search Layer 2 (and 3, actually) are responses to these specific
> issues, not exercises in protocol design. The documents didn't
> appear two years ago because it was important to understand how
> far a DNS-based solution could be extended. I suggest that we
> now know, and that the answer is "not into these language
> problems, or context problems, or even script problems that
> involve considering characters more than one at a time".
>
> "DNS Search", or, now, "IRNSS", is also, so far, just a
> framework. We have proposals for sublayer 2 on the table, but
> neither is complete and more proposals are welcome. But that
> framework explicitly provides for:
>
> * Language information
>
> * Country information
>
> * Matching functions that can consider complete name
> strings, not just individual characters and one label at
> a time.
>
> * Matching functions that can do parallel table lookups,
> apply spelling differentiators, and do other "fuzzy"
> things and do them as predictable, server-side
> operations.
>
> Isn't that the set of tools you are looking for?
>
> See following note on IRNSS.
>
> john
>