[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: draft-yergeau-rfc2279bis-02.txt for STANDARD



This is the issue. We talk about the number of bytes in the UTF-8 code.

4-byte sequences give a range up to 10FFFF.

After talking with Paul Hoffman and John Klensin I suggest the following:

(a) Do _not_ reference Unicode instead of ISO-10646
(b) Say in some text that we move to Standard with just testing 1-4 octets, changing the spec to cover just those values. That is, we should assume that the draft standard is for 1-4 octets, and therefore the full standard is too.
(c) Slip in a sentence somewhere (maybe as a security consideration) indicating that > 4 bytes is possible in the future and that programs should not be designed on the assumption that they will never see more than four bytes. I.e., interoperability testing at <= 4 is fine, but I'd hate to set someone up for a buffer overflow problem.

Is this ok with people?

paf

Begin forwarded message:

From: Francois Yergeau <FYergeau@alis.com>
Date: mån jan 6, 2003 20:02:41 Europe/Stockholm
To: Patrik Fältström <paf@cisco.com>
Subject: RE: draft-yergeau-rfc2279bis-02.txt for STANDARD

Hello Patrick,

I'm having second thoughts about all this and I would like to bounce them on
you to see what you think.

One thing is about the interop testing I did in late November: it is
incomplete, in that only 1-, 2- and 3-byte sequences were tested. Adding
4-byte sequences would be fairly easy to do, but I am at a loss to find a
way to test 5- and 6-byte sequences.

I'm now inclined to say that we should not test them, bite the bullet and
restrict UTF-8 to the UTF-16-accessible range of 0-10FFFF. There was some
discussion on the usefor list in December about doing just that. One
commentor there said that we should "come to your senses" and restrict to 4
bytes. And finally I got a private mail just today giving some arguments
for using Unicode 3.2 as the normative reference for UTF-8 instead of 10646;
doing so would also mean restricting to 4 bytes.

I have a -03 draft all ready to go, but I'd like to hear your opinion on the
above before I send it in.

Regards,

--
Franįois

-----Message d'origine-----
De : Patrik Fältström [mailto:paf@cisco.com]
Envoyé : 25 décembre 2002 02:39
Ā : Francois Yergeau; ietf-charsets@iana.org
Objet : draft-yergeau-rfc2279bis-02.txt for STANDARD


As Francois said there is consensus around -02 of the
document, and the
list is silent, I declare consensus.

If you have issues with it, let the list know ASAP.

I am now sending this to last call for STANDARD status.

Francois, there are some weird characters, which seems to be
"paragraph
number" before each paragraph. Can you have one last
"editing" round of
the document, and let me know when -03 is announced?

    paf

On fredag, nov 1, 2002, at 11:22 Europe/Stockholm, Patrik Fältström
wrote:

Any action on these issues I posted some time ago? Comments?

This document _really_ need to move forward.

    paf

On torsdag, okt 17, 2002, at 15:23 Europe/Stockholm, Patrik
Fältström
wrote:

Anyway, what needs to happen now is two things:

 - The text in the document has to be change to say BOM is
not to be
used
 - Someone has to write down interoperability information between
   applications

Regarding the interoperability, as Francois is working
hard with the
document, can I get someone else to write this? I myself
are mostly
irritated I can not copy and paste Unicode text between
TextEdit and
Microsoft software in MacOSX, so I might not be the best person to
write down things that work.... :-(