[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: draft-yergeau-rfc2279bis-02.txt for STANDARD




--On fredag, januar 10, 2003 07:02:57 +0100 Patrik Fältström
<paf@cisco.com> wrote:

> This is the issue. We talk about the number of bytes in the UTF-8 code.
>
> 4-byte sequences give a range up to 10FFFF.
That's what I thought -- I was having difficulty translating 4 bytes into
an FFFF limit.

Then I have no problems at all.
Neither do I.

> After talking with Paul Hoffman and John Klensin I suggest the following:

> (a) Do _not_ reference Unicode instead of ISO-10646

I think ISO 10646 has adopted the 10FFFF limit too - probably published as
an amendment somewhere. The Unicode folks would know.
That's my recollection as well. I believe 10646 and Unicode are aligned
on this, due to the need to restrict UTF-8 to the same range as UTF-16.

> (b) Say in some text that we move to Standard with just testing 1-4
> octets, changing the spec to cover just those values. That is, we should
> assume that the draft standard is for 1-4 octets, and therefore the full
> standard is too.

Saying that we eliminate the untested/unused feature of 5-6 octet encodings
is permitted under the rules for progression. We just missed eliminating
them at draft.....
Yes, exactly.

> (c) Slip in a sentence somewhere (maybe as a security
> consideration) indicating that > 4 bytes is possible in the future and
> that programs should not be designed on the assumption that they will
> never see more than four bytes.   I.e., interoperability testing at <= 4
> is fine, but I'd hate to set someone up for a buffer overflow problem.

I think this is not likely to be needed; it should be OK to treat 5+ byte
encodings as a protocol error. But I could be wrong...
Actually, I think it is preferable to treat them as protocol errors, due to
the need for UTF-16 compatibility.

				Ned