[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Back to work (Nameprep) (was: Re: Justsend UTF-8 with nameprep (was: RE: [idn] Reality Check))



John,

>This set of conversations has been very interesting to me, for
the unfortunate reason that it confirms the tentative, but
painful, conclusion I reached a few months ago.  Once we get
down to the really fine details of which characters match and
which do not, how to disambiguate glyphs from different scripts
that look (or are) identical, and so on, we need to rely on user
interfaces and human intelligence (or very close approximations
to the latter), rather than depending on absolute matching rules.

I agree for the most part; in that we can do the bulk of the "heavy-lifting"
in nameprep, handling case issues and obvious variants. However, for the
fine issues of mixtures of scripts and symbols, I think that at this point
we are better off not having absolute rules, but rather leaving it up to
some intelligence on the client side, assisted by the GUI.

> If we have to do that (and I agree that we will), then we almost
certainly need a non-DNS mechanism to support it.  If nothing
else, the performance costs of making multiple "maybe it is this
one" probes into the DNS to try to sort out ambiguities will
almost certainly be unacceptable (remember that DNS timeouts are
on the order of seconds and that cached negative responses
cannot have long durations).

I don't know that we need to specify the precise mechanism: after all, there
was none for "1" and "l", which may look different on your machine, but look
identical on mine right now. I agree that such intelligence should be on the
client side.

> If we _are_ going to fix the UIs to handle these cases, the
amount of work required to use a completely different mechanism
that supports human-assisted disambiguation tools (rather than
one encoding or another within the DNS) is almost certainly
quite small in comparison to other aspect of the effort required.

I agree. If we leave it up to humans to make the choice (assisted by GUI)
then probably something as simple as:

  Using http://www.unicode.org/Public/3.1-Update/Scripts-3.1.0.txt

  - find the first script that is neither COMMON nor INHERITED
  - flag any character that has a different script.

(flagging could be: different color, wavy underline, or some other
mechanism)

This should, however, be a recommendation not a requirement. More
sophisticated mechanisms could be developed in the future, either by this
body or by application vendors. For example, the consortium could develop
"confusible" mappings: sets of characters that in common fonts, at common
sizes, could be easily confused with one another. Given them, an application
could have a finer-grained GUI assist. However, the development of such
mappings would take time and effort, and continue to be refined over an
extended period.

After all,
(a) the majority of characters in different scripts are not confusible:
there is no Latin letter that looks like katakana KA.
(b) more importantly, not all confusible characters are separated by script:
especially in the area of symbols.

> And, if we need a non-DNS mechanism, the current ACE versus
UTF-8 debate has the potential to turn into "look in three or
more different ways" story.   That, in turn, dramatically
increases the odds of false positives, and false positives cause
both security and sanity problems (as well as driving trademark
lawyers crazy).

If it is impossible to register names *except those that have been
nameprepped*, then whether or not UTF-8 names go over the wire unprepped or
not should not cause much of a security problem.

    john