[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] case folding
If we were to discuss this and since we will be basing it on Unicode, it would
be less confusion if people can refer the glyphs to be allowed or disallowed
using Unicode Categories.
Abbr Description Proposal
Lu Letter, Uppercase Allow
Ll Letter, Lowercase Allow
Lt Letter, Titlecase Allow
Mn Mark, Non-Spacing Disallow
Mc Mark, Spacing Combining Disallow
Me Mark, Enclosing Disallow
Nd Number, Decimal Digit Allow
Nl Number, Letter Disallow, should we remap to Nd?
No Number, Other Disallow, should we remap to Nd?
Zs Separator, Space Disallow
Zl Separator, Line Disallow
Zp Separator, Paragraph Disallow
Cc Other, Control Disallow
Cf Other, Format Disallow
Cs Other, Surrogated Disallow
Co Other, Private Use Disallow (sure?)
Cn Other, No Assigned Disallow
Lm Letter, Modifier Allow
Lo Letter, Other Allow
Pc Punctuation, Connector Disallow
Pd Punctuation, Dash Disallow (except '-')
Ps Punctuation, Open Disallow
Pe Punctuation, Close Disallow
Pi Punctuation, Initial quote Disallow
Pf Punctuation, Final quote Disallow
Po Punctuation, Other Disallow
Sm Symbol, Math Disallow
Sc Symbol, Currency Disallow
Sk Symbol, Modifier Disallow
So Symbol, Other Disallow
Flame away :-)
-James Seng
"Eugene M. Kim" wrote:
>
> On Mon, 12 Jun 2000, James Seng wrote:
>
> | However, should we reopen the discussion on what codepoint is allowed and what
> | is not? I remember we have quite a heated argument and the consensus then was
> | to leave it in the proposal protocol. Any changes now?
>
> Yes and no. As people start to write the actual proposals, it seems a
> good timing to reopen those issues (so that proposal writers can benefit
> from the discussion) now, but none of the codepoints should be mandated
> in the requirements document (except for obvious ones such as alphabet
> characters and digits).
>
> Maybe some other codepoints make almost no sense to use in domain names
> as well; they include C0/C1 control codes and private use area
> (U+E000-U+F8FF) at least. And there are a lot of `controversial'
> letters such as arrow symbols and other pictographic characters.
>
> However, IMHO even though we are ever to define some codepoints to be
> excluded in requirements, they should not include any characters that
> can have some influence on a particular language or script, except when
> there is a strong discouraging reason to drop them.
>
> Eugene Kim
>
> --
> Eugene M. Kim <ab@astralblue.com>
>
> "Is your music unpopular? Make it popular; make music
> which people like, or make people who like your music."