[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] proposed i18n naming rules



-----BEGIN PGP SIGNED MESSAGE-----

"Eric A. Hall" wrote:
> Erik Nordmark wrote:
> > > Internationalized domain names have the following attributes:
> > >
> > >       character codes are explicit, not required to be normalized in
> > >         any way
> >
> > > Internationalized host identifiers are a subset of internationalized
> > > domain names, and have the following attributes:
> > >
> > >       MUST be normalized through nameprep prior to processing
> >
> > Looking for clue. Why aren't IDNs normalized through nameprep just
> > like IHNs? What are the benefits of this?
> 
> There are no benefits per se, but penalties which are avoided. Nameprep
> forbids characters which are commonly used in labels that are not valid in
> hostnames (including "_" for SRV, and chars valid for LHS of email
> addresses in SOA/RP), so running all IDNs through nameprep would prohibit
> IDNs which are not IHIs.

So remove the prohibitions from nameprep and put them somewhere else.
This is much simpler than having two distinct notions of equivalence for
IDNs and IHIs.

Note that just prohibiting a set of characters is in any case not sufficient
to express the following:

 - restriction of ASCII host labels to the RFC 1123 syntax,
 - prohibiting top-level labels that are all ASCII digits,
 - ensuring that the BiDi algorithm is one-to-one on valid host strings
   (the problem is with labels that arbitrarily mix RTL characters and
   numbers - although it is possible to allow numbers at the end of an
   RTL label in logical order),
 - prohibiting mixed TC/SC strings, if we want to do that [*],
 - prohibiting illegal sequences of Hangul Jamo (i.e. sequences that do
   not follow the syntax in the Unicode Standard),
 - prohibiting irregular sequences of Hangul Jamo (i.e. that could be
   equivalently expressed using a cluster Jamo, or that do not use the
   standard form for fillers),
 - prohibiting labels that start with a combining character.

[*] This may not be appropriate as a restriction on the hostname syntax,
    because mixed TC/SC is used in Hong Kong, for example. It can still
    be applied by some registrars as a matter of policy, though.

IMHO all of these restrictions, except possibly mixed TC/SC, should be
applied to registered IHIs. (There's no need for clients to enforce
them, since it is not a security problem if they aren't enforced by
clients, AFAICS.)

The draft I'm preparing will include a formal syntax for IHIs along
these lines, that also prohibits symbols and punctuation characters
not used as diacritics (Unicode has a convenient 'Diacritic' property
that covers the latter). I agree with the people who have said that it
is easier to start with a restrictive syntax and then relax it if
problems are found, than to do the opposite.


[Incidentally, I made a mistake in a previous post that gave an ABNF
definition of the RFC 1123 hostname syntax. It should be:

  RFC1123HostName = *(HostLabel ".") TopLevelHostLabel
  HostLabel = (LETTER / DIGIT) [0*61(LETTER / DIGIT / "-") (LETTER / DIGIT)]
  TopLevelHostLabel = HostLabel \ *DIGIT

i.e. 0*61 instead of 61*.]

- -- 
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBO/00NzkCAxeYt5gVAQHPlggAvyxtqWPCrSyjMHLkT1xXTHO1prCatLx7
ocAR/libW/vbbVZiY0schfBXWQwXfV4y2o+kPtQLixVZL+EztDzNlHPQXZQ2bC5i
IOU4Vs24l3s0lY2bjRXqpShQVesvNdHUGAyAEIElpNRfhEdW15zqEZdw1mjl3BT2
S/ILAdqvfbjbiqI8Ye22+qNUe3fxUpl2s0MmF3Ito8J3afXkfPdZEsKeklbKzhvK
F2MV0ARAI7qvea6duLizDQ40adcyBrAiypNyEE+75DFC++axVm4ndezDh2Q3mgoa
AuQZyjPA1uPS7KClDRMKxUncbNxGF6LWSUKPHVHgLPrPg1PG0SV52w==
=70d9
-----END PGP SIGNATURE-----