[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Fwd: Re: Rationale wanted for Unicode identifier rules
- To: idn@ops.ietf.org
- Subject: [idn] Fwd: Re: Rationale wanted for Unicode identifier rules
- From: Harald Tveit Alvestrand <Harald@Alvestrand.no>
- Date: Thu, 02 Mar 2000 11:52:22 +0100
- Delivery-date: Thu, 02 Mar 2000 05:41:53 -0800
- Envelope-to: idn-data@psg.com
Is this something we can use (possibly modified) in IDN to describe what a
reasonable character set for IDN labels is?
Harald
>X-UML-Sequence: 12492 (2000-03-01 21:35:45 GMT)
>From: Kenneth Whistler <kenw@sybase.com>
>To: "Unicode List" <unicode@unicode.org>
>Cc: unicode@unicode.org, kenw@sybase.com
>Date: Wed, 1 Mar 2000 13:35:44 -0800 (PST)
>Subject: Re: Rationale wanted for Unicode identifier rules
>
>John Cowan asked:
>
> >
> > Kenneth Whistler wrote:
> >
> > > A. Identifier syntax along the lines described in Unicode 3.0.
> >
> > Can you (or someone) supply a precis of this to the poor fellow
> > who still hasn't heard from his bookstore's order department?
> > Especially if it is indeed simpler than the Unicode 2.0 version?
>
>Sure. For those of you who already have the hymnal, turn to page 134 to
>sing along.
>
><identifier> ::= <identifier_start> (<identifier_start> |
><identifier_extend>)*
>
><identifier_start> is defined by an equivalent category set consisting of
> all those characters with the General Category values:
> Lu, Ll, Lt, Lm, Lo, Nl
>
><identifier_extend> is defined by an equivalent category set consisting of
> all those characters with the General Category values:
> Mn, Mc, Nd, Pc, Cf
>
>Thus, identifiers can start with any "letter" or "letter number".
>
>Identifiers can continue with any "letter" or "letter number", any combining
>mark (except the symbolic surrounds), any decimal digit, any connecting
>punctuation, or any format control character (e.g. the invisible bidi
>layout controls, ZWJ, ZWNJ, etc.).
>
>Note that this definition explicitly excludes the following General Category
>values from identifiers:
>
> Me, No, Zs, Zl, Zp, Cc, Pd, Ps, Pe, Pi, Pf, Po, Sm, Sc, Sk, So
>
>i.e. enclosing combining marks, "other numerals", all spaces, control
>characters, all other punctuation, and all "symbols".
>
>--Ken
--
Harald Tveit Alvestrand, EDB Maxware, Norway
Harald.Alvestrand@edb.maxware.no