[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] IDN WG Last Call on two major changes to Stringprep
Edmon Chung <edmon@neteka.com> wrote:
> -----
> 2) If a string contains any Right-to-Left character (defined as
> belonging to Unicode bidirectional categories "R" and "AL"), the string
> MUST NOT contain any Left-to-Right character (defined as belonging to
> Unicode bidirectional category "L").
>
> 3) If a string contains any Right-to-Left character (as defined above),
> a Right-to-Left character MUST be the first character of the string, and
> a Right-to-Left character MUST be the last character of the string.
> -----
>
> I dont quite understand why we need to have 3.
> Isnt 3 a subset of 2?
No, because there are characters that are neither Right-to-Left nor
Left-to-Right.
> Also this will mean that there cannot be a mixture between RTL and LTR
> characters.
Correct. That is exactly rule 2 above. "If X appears then Y must not
appear" is exactly equivalent to "X and Y must not both appear".
> While I am not familiar with Arabic, I sure have seen English words
> mixed with Arabic in phrases, albeit rare.
I don't doubt that. I know almost nothing about the bidi algorithm, but
the bidi experts concluded that this was the price that needed to be
paid to prevent distinct labels from being displayed identically.
> I didn't see much discussion on the list before on bidi issues, but I
> did see an example used:
>
> > Assume there were two labels inside the DNS, one reading ABCdef and
> > the other reading defABC, and both would be displayed CBAdef. Who
> > would consider that usable for the DNS?
>
> Why would both be displayed the same?
I don't know. :) The answer can presumably be inferred from the bidi
tech report.
> a given "string" can be a "part" of a label, so there could be two
> "strings", one containing LTR one RTL in the same label.
Nameprep doesn't need to know whether it's input string is a label or
part of a label or whatever. Nameprep could operate on any string.
But in IDNA, the IDNA spec specifies that nameprep is applied to labels,
not to substrings of labels. Therefore a label cannot contain both LTR
and RTL characters.
> Please clarify two simple things:
> a. Are mixed RTL and LTR characters allowed within a label?
No.
> b. If there are more categories than R, AL and L that we are discussing,
> then in point 3 it should not say "As defined above":
Yes it should. Rule 3 is using the same definition of Right-to-Left
character as stated in rule 2: 'belonging to Unicode bidirectional
categories "R" and "AL"'.
AMC