[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: draft-yergeau-rfc2279bis-02.txt for STANDARD





--On fredag, januar 10, 2003 07:02:57 +0100 Patrik Fältström <paf@cisco.com> wrote:

This is the issue. We talk about the number of bytes in the UTF-8 code.

4-byte sequences give a range up to 10FFFF.
Then I have no problems at all.
After talking with Paul Hoffman and John Klensin I suggest the following:

(a) Do _not_ reference Unicode instead of ISO-10646
I think ISO 10646 has adopted the 10FFFF limit too - probably published as an amendment somewhere. The Unicode folks would know.

(b) Say in some text that we move to Standard with just testing 1-4
octets, changing the spec to cover just those values. That is, we should
assume that the draft standard is for 1-4 octets, and therefore the full
standard is too.
Saying that we eliminate the untested/unused feature of 5-6 octet encodings is permitted under the rules for progression. We just missed eliminating them at draft.....

(c) Slip in a sentence somewhere (maybe as a security
consideration) indicating that > 4 bytes is possible in the future and
that programs should not be designed on the assumption that they will
never see more than four bytes.   I.e., interoperability testing at <= 4
is fine, but I'd hate to set someone up for a buffer overflow problem.
I think this is not likely to be needed; it should be OK to treat 5+ byte encodings as a protocol error. But I could be wrong...