[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: RFC 3530



As a liason to the Unicode Consortium, I received the following comments on RFC 3530.

Personally, I possibly should have found at least the first of these issues, so I have much myself to blame for not finding these errors earlier.

That said, I think a revision/addendum of the document ("Notes for use of Unicode with NFS version 4" or something like that) is needed given the comments below. Else there is a big risk we will end up with non-interoperable implementations.

Mark, if such an addendum is created, I presume you have time to help the editors/authors to find the correct wording, or that you can find a person which can help?

Regards, Patrik

Begin forwarded message:

From: "Mark Davis" <mark.davis@jtcsv.com>
Date: fre jun 20, 2003 21:19:00 Europe/Stockholm
To: Patrik Fältström <paf@cisco.com>
Cc: "Paul Hoffman / IMC" <phoffman@imc.org>, François Yergeau <francois@yergeau.com>, "Martin Duerst" <duerst@w3.org>
Subject: RFC 3530

Patrik,

I was recently pointed to RFC 3530. The incorporation of UTF-8 into
the standard is very welcome, but I found a few problems in the text.
It was very unclear from the document who to foward the comments to,
so as liaison could you forward them?

Here are the problematic passages:

1 With respect to the case_insensitive and case_preserving
attributes,
each UCS-4 character (which UTF-8 encodes) has a "long descriptive
name" [RFC1345] which may or may not included the word "CAPITAL" or
"SMALL". The presence of SMALL or CAPITAL allows an NFS server to
implement unambiguous and efficient table driven mappings for case
insensitive comparisons, and non-case-preserving storage. For
general character handling and internationalization issues, see the
section "Internationalization".

This is *not* a reliable guide to the case of letters. A case variant
*cannot* be found by simply replacing SMALL by CAPITAL or vice versa.

Suggested revision:

An NFS server can implement unambiguous and efficient table driven
mappings for case insensitive comparisons, and non-case-preserving
storage, either by using the Unicode Consortium case-mapping tables,
or using the Stringprep tables derived from the Unicode sources. For
general character handling and internationalization issues, see the
section "Internationalization".

2 Stringprep discusses Unicode characters, whereas NFS version 4
renders UTF-8 characters. Since there is a one to one mapping from
UTF-8 to Unicode, where ever the remainder of this document refers
to
to Unicode, the reader should assume UTF-8.

These statements are misleading. Unicode characters have numeric
values in the range from 0 to 0x10FFFF. These numbers are encoded in
different ways:
UTF-8 uses 1 to 4 eight-bit bytes per character
UTF-16 uses 1 to 2 sixteen-bit words per character
UTF-32 uses 1 thirty-two-bit word per character.
All of these are valid Unicode.

Suggested revision:

Where ever the remainder of this document refers to to Unicode, the
reader should assume the UTF-8 encoding of Unicode.

3 Where the client supplied string is valid UTF-8 but contains
characters that are not supported by the server as a value for that
string (e.g., names containing characters that have more than two
octets on a filesystem that supports Unicode characters only), the
server should return an NFS4ERR_BADCHAR error.

The example doesn't make sense. All Unicode characters are expressable
in UTF-8, and all characters expressable in UTF-8 are Unicode
characters (as above).

Suggested revision:

Where the client supplied string is valid UTF-8 but contains
characters that are not supported by the server as a value for that
string (e.g., if the server doesn't support names containing
characters greater than U+FFFF), the server should return an
NFS4ERR_BADCHAR error.

4 ... The UTF-8 encoding of the UCS as
defined by [ISO10646] allows for this type of access and follows
the
policy described in "IETF Policy on Character Sets and Languages",
[RFC2277].

This is not an error, but people may think that the UTF-8 definition
in 10646 is not Unicode. I think François Yergeau can suggest better
wording based on the new version of RFC2279.

Note: the proposal should be checked for grammar, e.g. "if it's post
processed form collides"

Mark
__________________________________
http://www.macchiato.com
► “Eppur si muove” ◄