[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC 3530



Well, as far as I can tell from the text, for a careful reader none of these
would lead to interoperable implementations. Even the first one is not too
awful, since the precise instructions in the Internationalization section do not
depend on that previous passage.

That being said, I think all of these may lead non-careful readers astray, at
least to some extent, and may either be confusing or lead to mistakes. I'd
certainly be glad to help review any supplemental text, and I supplied some
suggested replacements below. Is there anything like a W3C or Unicode Erratum
document for RFCs?

Mark

----- Original Message ----- 
From: "Patrik Fältström" <paf@cisco.com>
To: <spencer.shepler@sun.com>; <beame@bws.com>; <brent.callaghan@sun.com>;
<mike@eisler.com>; <dnoveck@netapp.com>; <david.robinson@sun.com>;
<robert.thurlow@sun.com>
Cc: "Mark Davis" <mark@macchiato.com>; "IESG" <iesg@ietf.org>
Sent: 2003 Jun 20 Fri 12:50
Subject: Fwd: RFC 3530


As a liason to the Unicode Consortium, I received the following
comments on RFC 3530.

Personally, I possibly should have found at least the first of these
issues, so I have much myself to blame for not finding these errors
earlier.

That said, I think a revision/addendum of the document ("Notes for use
of Unicode with NFS version 4" or something like that) is needed given
the comments below. Else there is a big risk we will end up with
non-interoperable implementations.

Mark, if such an addendum is created, I presume you have time to help
the editors/authors to find the correct wording, or that you can find a
person which can help?

    Regards, Patrik

Begin forwarded message:

> From: "Mark Davis" <mark.davis@jtcsv.com>
> Date: fre jun 20, 2003  21:19:00 Europe/Stockholm
> To: Patrik Fältström <paf@cisco.com>
> Cc: "Paul Hoffman / IMC" <phoffman@imc.org>, François Yergeau
> <francois@yergeau.com>, "Martin Duerst" <duerst@w3.org>
> Subject: RFC 3530
>
> Patrik,
>
> I was recently pointed to RFC 3530. The incorporation of UTF-8 into
> the standard is very welcome, but I found a few problems in the text.
> It was very unclear from the document who to foward the comments to,
> so as liaison could you forward them?
>
> Here are the problematic passages:
>
> 1   With respect to the case_insensitive and case_preserving
> attributes,
>    each UCS-4 character (which UTF-8 encodes) has a "long descriptive
>    name" [RFC1345] which may or may not included the word "CAPITAL" or
>    "SMALL".  The presence of SMALL or CAPITAL allows an NFS server to
>    implement unambiguous and efficient table driven mappings for case
>    insensitive comparisons, and non-case-preserving storage.  For
>    general character handling and internationalization issues, see the
>    section "Internationalization".
>
> This is *not* a reliable guide to the case of letters. A case variant
> *cannot* be found by simply replacing SMALL by CAPITAL or vice versa.
>
> Suggested revision:
>
> An NFS server can implement unambiguous and efficient table driven
> mappings for case insensitive comparisons, and non-case-preserving
> storage, either by using the Unicode Consortium case-mapping tables,
> or using the Stringprep tables derived from the Unicode sources.  For
> general character handling and internationalization issues, see the
> section "Internationalization".
>
> 2   Stringprep discusses Unicode characters, whereas NFS version 4
>    renders UTF-8 characters.  Since there is a one to one mapping from
>    UTF-8 to Unicode, where ever the remainder of this document refers
> to
>    to Unicode, the reader should assume UTF-8.
>
> These statements are misleading. Unicode characters have numeric
> values in the range from 0 to 0x10FFFF. These numbers are encoded in
> different ways:
> UTF-8 uses 1 to 4 eight-bit bytes per character
> UTF-16 uses 1 to 2 sixteen-bit words per character
> UTF-32 uses 1 thirty-two-bit word per character.
> All of these are valid Unicode.
>
> Suggested revision:
>
> Where ever the remainder of this document refers to to Unicode, the
> reader should assume the UTF-8 encoding of Unicode.
>
> 3   Where the client supplied string is valid UTF-8 but contains
>    characters that are not supported by the server as a value for that
>    string (e.g., names containing characters that have more than two
>    octets on a filesystem that supports Unicode characters only), the
>    server should return an NFS4ERR_BADCHAR error.
>
> The example doesn't make sense. All Unicode characters are expressable
> in UTF-8, and all characters expressable in UTF-8 are Unicode
> characters (as above).
>
> Suggested revision:
>
> Where the client supplied string is valid UTF-8 but contains
> characters that are not supported by the server as a value for that
> string (e.g., if the server doesn't support names containing
> characters greater than U+FFFF), the server should return an
> NFS4ERR_BADCHAR error.
>
> 4   ... The UTF-8 encoding of the UCS as
>    defined by [ISO10646] allows for this type of access and follows
> the
>    policy described in "IETF Policy on Character Sets and Languages",
>    [RFC2277].
>
> This is not an error, but people may think that the UTF-8 definition
> in 10646 is not Unicode. I think François Yergeau can suggest better
> wording based on the new version of RFC2279.
>
> Note: the proposal should be checked for grammar, e.g. "if it's post
> processed form collides"
>
> Mark
> __________________________________
> http://www.macchiato.com
> ►  “Eppur si muove” ◄
>