[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [idn] Punicode: Upper-case in example
...
> It is not documented in the spec because it is not a feature of
> Punycode. The Punycode algorithm inputs and outputs code
> points, which are numbers. It does not input or output "u+".
>
> The sample implementation inputs and outputs "u+". Therefore the use
> of the u as a 1-bit annotation is mentioned in the
> documentation of the
> sample implementation, which is embedded in the source code (you can
> either read the source code of the usage() function, or run
> the program
> with no arguments).
I find this confusing. And I think two things have gone wrong here.
In general, an example implementation (of any software specification)
should be as pure as possible, no extra bells and whistles. Same
goes for example strings (if any). So the "1-bit annotations"
appear to not belong. If they belong, they surely should not
manifest themselves as "U+"/"u+", which in addition is not explained.
Furthermore, in this case, there are example argument (to encode)
and result (from decoding) strings that are not readily presentable
in an ASCII document (as IETF documents apparently still have to be).
So there is an additional step of presenting characters that are not
ASCII printable ones. Using "U+" notation for such code points in
the example string presentation is fine, though. This additional
step does NOT belong in a sample implementation, but should be kept
strictly outside of it. It would still be nice to have the example
strings also in UTF-8 in the document, if possible, even if the
example implementation does not work directly on UTF-8 strings.
As for the "test wrapper" part of the sample implementation, if
at all presented, it should read/write test strings in at least one
of the "major" Unicode encodings (easiest would be to use UTF-32;
the conversion to/from UTF-32 and some other Unicode encoding
does not even belong in the *example test* wrapper, nor does
encode/decode of "U+"-notation).
Examples (strings, code) should be there to illustrate required
behaviour. Not confuse the readers with irrelevant (and even
misleading and hard to see) extras.
/kent k