[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Thread on - Re: [idn] Prohibit CDN code points
Dear Adam & all :
Thanks your reply , It is very clear, Punycode has some optional
features that can be selected by the implementers. How to input the
information of annotational flag is an implementation issue.
> > Q1: U+hhhh can be represented as u+hhhh or not ?
>
> The Unicode standard always uses U+, never u+, and the same is true of
> the IDNA draft. The Punycode draft always uses U+ in the main spec, but
> the sample implementation uses both U+ and u+ in order to represent the
> annotation flags, and the examples section likewise uses both U+ and u+
> to make it easy to feed the examples into the sample implementation.
>
To my understand, I descrbe it as: if the implementation want to
use the feature of annotation, then the flag of annotation should be passed
from ToASCII thru nameprep to Punycode. If a tradition of IETF RFC
protocol is followed , input parameter by an ASCII string like U+hhhh
will be used , so U+/u+ can be an annotational flags.
> > Q2: Here U+HHHH is not a hostname , does it MUST be forced to lower
> > u+hhhh or not in nameprep ?
>
> The case of the U is not part of the code point. A code point is just
> an integer. For example, U+0391 and u+0391 both represent the integer
> 913 (decimal) which is the code point for uppercase alpha. U+03B1
> and u+03B1 both represent the integer 945 (decimal) which is the code
> point for lowercase alpha. Nameprep always converts uppercase alpha
> to lowercase alpha (so it would always output 945, never 913), but a
> nameprep implementation that included support for mixed case annotations
> would output not only an array of code points but also a parallel
> array of case flags, and the lowercase alpha (945) would be flagged
> as "wanting to be uppercase". The flags could be passed along to the
> Punycode encoder and recovered by the Punycode decoder.
>
> The Punycode sample implementation and examples sections use U+03B1
> to mean "lowercase alpha with flag set (wants to be uppercase)" and
> use u+03B1 to mean "lowercase alpha with flag clear (wants to stay
> lowercase)".
>
> The flags have no affect on which ASCII letters and digits are output
> by the Punycode encoder. The flags merely affects the upper/lowercase
> property of the ASCII letters.
>
I think it is very clear, the annotational flag just let the
Punycode input from an original U/u+hhhh can output a LDH string with a
upper/lower character is set in its corresponding output part. Punycode do
not change any results from nameprep even it is work in single case
annotational mode.
Actually, any ACE encoder can do it the same way, but
Punycode is a special one that treat basic code point and non-basic-point
as two separate parts, so it can let original flag information can be
displayed in output string even the coded ACE string is a lower case code
point . The displayed case is no effect in DNS query, but it can help to
let user to view an original IDN form without feeling the nameprep has do
some strange thing to the ML-domain name.
Thanks your careful answers.
Best regards
L.M.Tseng