[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Matching and comparison



At 02:13 PM 1/22/00 +0900, Martin J. Duerst wrote:
>Well, Durst and the variants you listed are similar to you,
>because you are not familliar with them. The are not
>similar to me, because I'm familliar with them.
>
>I guess names differing by case are similar to most people.

I don't agree, but I see your point. Unfortunately, it looks like we have 
to go through the list of what "most people" might think would be similar 
or different and preclude the similar ones. I smell a rat hole....

> > >Definitely not. But we should not throw all 'similar names' problems
> > >in the same pot. Some of them are very productive (in particular
> > >casing), some are much less productive.
> >
> > I don't see what you mean by "productive" here. "Solvable"?
>
>Productive in the sense it is used in linguistics:
>
>How many variants can you produce with that phenomenon?

Ah! I like that.

>For casing, it's 2**n, and n can easily be 10 or more.
>
>For ignoring accents or not (important probably for French),
>it's 2**n, but n is usually very small.
>
>So the size of the various cases differ. Depending on the size,
>various solutions can be appropriate.

This makes good sense.

>Whether the letters look the same or not is one factor affecting spoofing.
>Whether you can make people believe they are the same or not is another.

Exactly right.

> > But we will have to tell everyone (or at least developers) enough to help
> > enter internationalized characters that end users don't understand. That
> > is, if I see a URL with hiragana in it, I should at least have a chance of
> > entering it correctly even if I don't understand Japanese.
>
>I don't think an implementer of a DNS front-end should have to
>care about this. This is an operating system/window system issue.

You're missing my point. I don't want to tell systems how to enter the 
characters, but I do think we need to specify what can and cannot be 
entered. I see this as a two step process: canonicalize, then reject if 
there are forbidden characters.

>Next time we meet, I'll give you a chance to try and enter
>some hiragana, or if you want kanji, on my system.

Been there, done that. :-) Started in 1984.

> > >Some special ways of affecting conjunct formation from the
> > >character codes have to be looked at. But general conjunct
> > >formation is just a display issue.
> >
> > Exactly right. And it needs to be dealt with.
>
>What do you want to specify in an IDNS protocol?
>Wouldn't saying 'behave as specified or implied by
>ISO 10646/Unicode' be enough?

Yes!

> > Only if the protocol only uses Unicode. :-)
>
>Which I don't think we should specify in the req doc,
>but which I'm sure we will come back to later.

Why not now? If there are no non-Unicode ways to meet the other 
requirements in the requirements spec, I think we should deal with Unicode 
in the requirements spec. If there are multiple methods, we shouldn't.

> > Yes, exactly! Almost all of these problems will be dealt with fully by
> > applying canonical normalization of each character set allowed in the
> > domain name part. But that is not yet considered a requirement by this 
> group.
>
>I wouldn't make 'apply canonical normalization' a requirement.
>if you want the relevant requirements, I suggest we take them
>from http://www.w3.org/TR/WD-charreq, section 2 and 3
>(or just point to them). That were, indirectly, the requirements
>for canonical normalization.

I don't think we're being truthful by doing "indirect" requirements. Your 
requirements document was written in a different political environment than 
this one. Maybe they're similar, but I certainly hope not.


--Paul Hoffman, Director
--Internet Mail Consortium