vint cerf <vinton.g.cerf@wcom.com> writes:
It seems to me that we err if we mix "finding" identifiers
(with search engines, elaborate directories that offer multiple
choices of IDNs based on imprecise search criteria) with
resolving unambiguous identifiers into their respective IP addresses
(speaking roughly since DNS also offers indirect resolutions such as
MX, CNAME and so on).
I think we do ourselves a disservice if we try to make DNS resolve
ambiguous references - it is not designed for such applications;
search engines and directory structures are more oriented towards
that aspect of finding things "by name" on the Internet.
This seem to argue against the current design of IDNA.
IDNA resolves some ambiguities in identifiers by Unicode
normalization, and introduces further ambiguities by not handling
legacy charset transcoding issues at all.
Simon, both of those statements are wrong, and Vint is right. Unicode
normalization doesn't fix ambiguous references, it canonicalizes
references: there is a huge difference between those two. "Letter A
followed by combining umlaut" is not ambiguous: it means that the
display should show an a with an umlaut over it. There are charset
transcoders today that transcode differently from each other. That's
not an ambiguity, that's a mistake. No one can create protocols that
fix every previous mistake.Now, one can argue that Unicode normalization is only used because
Unicode happens to have different ways of representing the same, or
non-visual, characters, but nevertheless this adds an ambiguity
resolving mechanism to software. One that will have to be modified
over time, as well, since consensus on how to resolve ambiguities will
change over time. I have trouble visualizing how this can be
implemented and work well for 2, 5, 10 years and more, when Unicode
and other charsets are moving targets.