[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Punicode: Upper-case in example
Hello Adam,
Many thanks for your quick response.
At 22:50 02/11/26 +0000, Adam M. Costello wrote:
Martin Duerst <duerst@w3.org> wrote:
> In http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-03.txt,
> example (I) says:
>
> (I) Russian (Cyrillic):
> U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E
> u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440
> u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A
> u+0438
> Punycode: b1abfaaepdrnnbgefbaDotcwatmq2g4l
>
> The presence of the upper-case 'D' (not to say the string 'Dot' :-)
> is confusing, because it seems completely arbitrary. There is no
> upper-case letter in the Cyrillic string.
> How did the upper-case D get in there?
It corresponds to the uppercase U in one of the code points in the u+
notation. The sample Punycode implementation uses the case of the u
as a 1-bit annotation.
I see. I don't think this is a very good idea to use the U+ for
distinction, for the following reasons:
1) The u+ -> lower case, U+ -> upper case is not documented anywhere
in the punycode draft (or at least I didn't find it). If used at
all, it should be documented straight at the start of the examples.
2) The above convention is very easy to overlook, in particular because
u+ and U+ look so very similar. It is close to a widely established
convention, but differs slightly.
3) Punycode can be used in different ways, on mixed strings, on
lc strings that still contain the original casing info, and
on pure lc strings. Maybe there should be separate examples
for all these three uses.
Regards, Martin.