[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Thread on - Re: [idn] Prohibit CDN code points
"tsenglm@????????????.??????.tw" <tsenglm@cc.ncu.edu.tw> wrote:
> Unicode [UNICODE] is a coded character set containing tens of thousands
> of characters. A single Unicode code point is denoted by "U+" followed
> by four to six hexadecimal digits...
>
> My question are:
> Q1: U+hhhh can be represented as u+hhhh or not ?
The Unicode standard always uses U+, never u+, and the same is true of
the IDNA draft. The Punycode draft always uses U+ in the main spec, but
the sample implementation uses both U+ and u+ in order to represent the
annotation flags, and the examples section likewise uses both U+ and u+
to make it easy to feed the examples into the sample implementation.
> Q2: Here U+HHHH is not a hostname , does it MUST be forced to lower
> u+hhhh or not in nameprep ?
The case of the U is not part of the code point. A code point is just
an integer. For example, U+0391 and u+0391 both represent the integer
913 (decimal) which is the code point for uppercase alpha. U+03B1
and u+03B1 both represent the integer 945 (decimal) which is the code
point for lowercase alpha. Nameprep always converts uppercase alpha
to lowercase alpha (so it would always output 945, never 913), but a
nameprep implementation that included support for mixed case annotations
would output not only an array of code points but also a parallel
array of case flags, and the lowercase alpha (945) would be flagged
as "wanting to be uppercase". The flags could be passed along to the
Punycode encoder and recovered by the Punycode decoder.
The Punycode sample implementation and examples sections use U+03B1
to mean "lowercase alpha with flag set (wants to be uppercase)" and
use u+03B1 to mean "lowercase alpha with flag clear (wants to stay
lowercase)".
The flags have no affect on which ASCII letters and digits are output
by the Punycode encoder. The flags merely affects the upper/lowercase
property of the ASCII letters.
AMC