[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: Legacy charset conversion in draft-ietf-idn-idna-08.txt



"James Seng" <jseng@pobox.org.sg> writes:

> - mapping to/from legacy encoding to ISO/IEC 10646 is available on the
> CDROM, if you buy the CDROM version from your standard body.

Does the CD include more than what is available on the net from
www.unicode.org/Public/MAPPINGS/?

I can't find offical mappings for, e.g., ISO-2022 (RFC 1468) online.

> - IDNA already specify (or suggested) that if the apps is using legacy
> encodings, it should transcode to Unicode first.

Yes, but I cannot find where it says how to do it, or where it cites a
reference that explains how to do it.  That's why I think the security
consideration should be more clear that the standard will enable
trivial attacks that have security consequences if the document is
implemented, because the details of legacy transcoding is left
unspecified.

If there is no standard way to translate ISO-2022-JP into Unicode,
won't different applications implement it differently?

Many machines use legacy encodings, how IDNA ends up being implemented
on such systems seems to be up to the implementor right now.

>
> -James Seng
>
>> This last sentence seem to brush a practical problem under the rug.
>> Most systems aren't Unicode based today, so in fact most systems will
>> have to implement this unspecified transcoding.  The Unicode
>> consortium has not specified how to transform Unicode to/from legacy
>> encodings.  There are some unofficial mappings for ISO 8859-1 charsets
>> on www.unicode.org/Public/MAPPINGS/, but even unofficial mappings for
>> other charsets (in particular CJK) is not present.
>>
>> Real world scenario: My machine uses ISO-8859-1.  I enter 0xB5.  How
>> is this transcoded into Unicode?  U+00B5 or U+03BC?  There are many
>> similar examples.
>>
>> I think the third paragraph of the security consideration should more
>> clearly express that IDNA actually is vulnerable to the attack if
>> machines, like most machines on the Internet, use legacy encodings.
>>
>> Some high-level insight on the problem:
>> http://www.cl.cam.ac.uk/~mgk25/unicode.html#conv
>>
>>
>>