[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Matching and comparison
- To: idn@ops.ietf.org
- Subject: Re: Matching and comparison
- From: Paul Hoffman / IMC <phoffman@imc.org>
- Date: Sat, 22 Jan 2000 09:13:41 -0800
- Delivery-date: Sat, 22 Jan 2000 09:23:02 -0800
- Envelope-to: idn-data@psg.com
At 02:13 PM 1/22/00 +0900, Martin J. Duerst wrote:
>Well, Durst and the variants you listed are similar to you,
>because you are not familliar with them. The are not
>similar to me, because I'm familliar with them.
>
>I guess names differing by case are similar to most people.
I don't agree, but I see your point. Unfortunately, it looks like we have
to go through the list of what "most people" might think would be similar
or different and preclude the similar ones. I smell a rat hole....
> > >Definitely not. But we should not throw all 'similar names' problems
> > >in the same pot. Some of them are very productive (in particular
> > >casing), some are much less productive.
> >
> > I don't see what you mean by "productive" here. "Solvable"?
>
>Productive in the sense it is used in linguistics:
>
>How many variants can you produce with that phenomenon?
Ah! I like that.
>For casing, it's 2**n, and n can easily be 10 or more.
>
>For ignoring accents or not (important probably for French),
>it's 2**n, but n is usually very small.
>
>So the size of the various cases differ. Depending on the size,
>various solutions can be appropriate.
This makes good sense.
>Whether the letters look the same or not is one factor affecting spoofing.
>Whether you can make people believe they are the same or not is another.
Exactly right.
> > But we will have to tell everyone (or at least developers) enough to help
> > enter internationalized characters that end users don't understand. That
> > is, if I see a URL with hiragana in it, I should at least have a chance of
> > entering it correctly even if I don't understand Japanese.
>
>I don't think an implementer of a DNS front-end should have to
>care about this. This is an operating system/window system issue.
You're missing my point. I don't want to tell systems how to enter the
characters, but I do think we need to specify what can and cannot be
entered. I see this as a two step process: canonicalize, then reject if
there are forbidden characters.
>Next time we meet, I'll give you a chance to try and enter
>some hiragana, or if you want kanji, on my system.
Been there, done that. :-) Started in 1984.
> > >Some special ways of affecting conjunct formation from the
> > >character codes have to be looked at. But general conjunct
> > >formation is just a display issue.
> >
> > Exactly right. And it needs to be dealt with.
>
>What do you want to specify in an IDNS protocol?
>Wouldn't saying 'behave as specified or implied by
>ISO 10646/Unicode' be enough?
Yes!
> > Only if the protocol only uses Unicode. :-)
>
>Which I don't think we should specify in the req doc,
>but which I'm sure we will come back to later.
Why not now? If there are no non-Unicode ways to meet the other
requirements in the requirements spec, I think we should deal with Unicode
in the requirements spec. If there are multiple methods, we shouldn't.
> > Yes, exactly! Almost all of these problems will be dealt with fully by
> > applying canonical normalization of each character set allowed in the
> > domain name part. But that is not yet considered a requirement by this
> group.
>
>I wouldn't make 'apply canonical normalization' a requirement.
>if you want the relevant requirements, I suggest we take them
>from http://www.w3.org/TR/WD-charreq, section 2 and 3
>(or just point to them). That were, indirectly, the requirements
>for canonical normalization.
I don't think we're being truthful by doing "indirect" requirements. Your
requirements document was written in a different political environment than
this one. Maybe they're similar, but I certainly hope not.
--Paul Hoffman, Director
--Internet Mail Consortium