[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Re: Legacy charset conversion in draft-ietf-idn-idna-08.txt
Roozbeh Pournader <roozbeh@sharif.edu> writes:
>> CP437 0xE1: U+03B2 / U+00DF: ?
>> CP437 0xEE: U+03B5 / U+2208: ?
>
> As far as I know, all existing CP437 tables map those to SMALL SHARP S
> (U+00DF) and SMALL EPSILON (U+03B5) and not SMALL BETA or ELEMENT OF.
> I just checked everywhere I could, this is the list:
>
> http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT
> http://microsoft.com/globaldev/reference/oem/437.htm
> http://www.kostis.net/charsets/cp437.htm
>
> Can you point me to anywhere disagreeing?
No. I was thinking of mapping from Unicode into legacy charsets here,
which probably wasn't clear (I probably hadn't realized I made that
assumption when I wrote that message).
To understand why that is ever relevant, I think it is useful to
summarize the security related problems with internationalized text
identifiers in applications. We have only talked about the first two
yet.
1) Internationalized text strings extend the set of possible
characters to use in identifiers. Traditionally a-z0-9- is such a
small set that users can separate two elements of the set easily
("oh! mybank.com is not the same string as mubank.com, I must be
under attack!"). With Unicode, users cannot easily do this
anymore. I think the point of CK normalization is to mitigate this
problem, but the basic problem will still be there. Users must
understand that different but similar looking characters are
different. This problem cannot be solved entirely, and people will
exploit this fact.
2) On systems with non-ASCII non-Unicode charsets, applications need
to transcode strings entered using the system charset into Unicode
before applying IDN stuff. If different applications use different
mapping tables, or if those mapping tables change, in a way that CK
normalization does not cancel out, there will be additional
attacks.
3) On systems with non-ASCII non-Unicode charsets in security
applications, the system will have to convert Unicode in IDNA into
system charset before sending it to those security applications.
Here is the problem I was referring to above. Consider a user
entering a string in the browser containing CP437 0xE1, it is
IDNAlized within the resolver and the user connects to the server,
receives (for security purposes, in a TLS stream for instance),
e.g., a certificate containing an IDNA. If IDNA is not supposed to
force the entire system to switch to Unicode, the application will
have to convert the IDNA string into the system charset to display
it to the user. So it converts U+03B2 into CP437 0xE1 and displays
it, the user compares the strings (possibly even by studying the
byte sequence to be sure, if a system charset with similar looking
symbols are used) and can verify it. However, it is not
unreasonable for the application to convert U+00DF into CP437 0xE1
as well. So the attack will only have to register the domain using
U+00DF instead of U+03B2 in order to mount an attack.
Ok, I admit that the case in 3) can be solved in two ways. Either the
user is shown the IDNA strings and is allowed to compare them. Or the
system is upgraded to use Unicode in all involved applications,
including the display engine. My point is that this attack isn't
discussed in the security considerations. Also, the two solutions
aren't very good, IMHO, since the first one will generate a bad user
experience and the second will take years to implement. And bad
solutions are only implemented badly or not at all, so this will most
likely generate security incidents. Which for prudence should at
least be mentioned in security considerations.