[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Re: IDNA: is the specification proper, adequate, and complete? (was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)
At 6:32 PM +0200 6/17/02, Simon Josefsson wrote:
If I see the (swedish) word "å" displayed on my screen, cut'n'paste it
into a browser, an IDNA resolver will normalize this into U+00C5
before a server is queried for that string, regardless of whether the
original string was U+00C5 or U+212B. Isn't this "resolving
ambiguity"?
No, it is canonicalizing. In the bits on the wire, there is nothing
ambiguous about the combined version or the uncombined version: they
are very clearly different sets of characters, and the representation
of those characters in every encoding of Unicode is also
non-ambiguous.
Isn't there an ambiguity between U+00C5 and U+212B?
Only visually, not in the protocol.
If
there isn't an ambiguity between U+00C5 and U+212B, why does IDNA
treat them the same?
It doesn't. It canonicalizes one into the other. That is far from
"treating them the same", yes?
Perhaps I fail to communicate, not being a
native speaker perhaps I'm interpreting the word "ambiguous"
incorrectly, although my dictionary doesn't seem to help me find any
alternative interpretation.
My dictionary has these definitions:
- having two or more possible meanings
- doubtful, uncertain
U+00C5 and U+212B do not have the same meaning, and there is nothing
doubtful or uncertain about either of them.
> There are charset transcoders today that transcode differently from
each other. That's not an ambiguity, that's a mistake. No one can
create protocols that fix every previous mistake.
You can fix the one mistake.
Which one mistake is that? There are probably dozens of transcoders
with errors, and worse yet, there are probably dozens of transcoder
implemntors that, in the face of some IETF or Unicode standard that
tells them how to transcode, would say "screw you, you don't
understand our language" (and they would possibly be correct).
> So your solution is that nothing can ever be internationalized?
That's not a solution, and that's not what I'm proposing, I don't
understand how that could ever be read into what I wrote,
Because you said "I have trouble visualizing how this can be
implemented and work well for 2, 5, 10 years and more, when Unicode
and other charsets are moving targets." I agree with you that Unicode
and other charsets are moving targets.
but I'll try
to be specific on how to solve the problems with IDNA right now,
giving internationalized domain names that would be secure and could
be implemented and continue to work years ahead:
First, specify clearly that application MUST NOT use any other
normalization table than the one defined in the IDNA spec suite
(following Unicode 3.1 currently, being updated to Unicode 3.2 if I
understand things correctly) and that in particular normalization
tables supplied by operating systems should never be used unless the
application author can assert that they will never change throughout
the lifetime of the application (which probably only will be true if
the application author is the operating system author).
We already say the first part (you must use the Unicode 3.1 -- soon
to be Unicode 3.2 -- table). We don't say the second part because it
flows from the first part.
Secondly, define how to transcode legacy charsets into Unicode, and
specify that only this transcoding table is to be used. Transcoding
mapping tables can be defined in RFCs, much like MIME CTE's or
similar. The initial IDNA spec suite could define transcoding tables
for commonly used charsets; ISO-8859-X, ISO-2022-X, KOI8-X,
KS-C-5601-X etc.
Yes, we could do that, but the IETF lacks both the linguistic and
political expertise to do it. The fact that even the experts such as
ISO and the Unicode Consortium have not chosen to do this should be a
very broad hint to you about why the IETF shouldn't. But if you
really think this is needed (I still don't), you absolutely should
ask the appropriate bodies (Unicode or ISO) to do it. If they do it,
I'd bet that the IETF would strongly consider pointing to those
standards.
The main argument against these proposals are that they require lots
of work to implement, but if the alternative is poor security, I'd
rather have people do lots of work. A lesser argument against it is
that they don't adopt new updates to Unicode, but that is by design.
We disagree about what the main argument is. Creating transcoding
tables is easy; in fact, it has already been done. See
<http://www.unicode.org/Public/MAPPINGS/> for some non-official
mappings.
My main argument against the IETF doing this is that being sure the
tables are "right" is nearly impossible because it involves getting
consensus among the users of the scripts and the experts.
--Paul Hoffman, Director
--Internet Mail Consortium
- References:
- Re: [idn] IDNA: is the specification proper, adequate, and complete? (was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)
- From: Dave Crocker <dhc@dcrocker.net>
- Re: [idn] IDNA: is the specification proper, adequate, and complete? (was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)
- From: Dave Crocker <dhc@dcrocker.net>
- Re: [idn] IDNA: is the specification proper, adequate,and complete? (was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)
- From: vint cerf <vinton.g.cerf@wcom.com>
- [idn] Re: IDNA: is the specification proper, adequate, and complete?(was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)
- From: Simon Josefsson <simon+idn@josefsson.org>
- [idn] Re: IDNA: is the specification proper, adequate, andcomplete? (was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)
- From: Paul Hoffman / IMC <phoffman@imc.org>
- [idn] Re: IDNA: is the specification proper, adequate, and complete? (was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)
- From: Simon Josefsson <simon+idn@josefsson.org>