[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] naming syntax rules




"Dan Oscarsson" <Dan.Oscarsson@trab.se> wrote:

> >The basic idea here is to declare formal data-types for labels, and to
> >incorporate the data-types into syntaxes for applications and protocols to
> >use when they need to interact with domain names.
>
> While this is good, to make DNS really work the foundamental
> rules should be the same for all labels. Just like it has been
> so far.

No, DNS has two rules: octet strings with ASCII, and a hostname subset.

> While Eric allows binary in DNS labels making things very complex,
> I think we should go for how DNS is used and teher are at least
> one RFC defining text and binary labels (binary label is defined by
> EDNS).

Binary EDNS is an application-specific representation of a sequence. It does
not define methods for storing or representing binary values as domain names
in application or protocol data-streams. The current discussion is talking
about the unencoded (raw) domain names, which may be encoded in DNS or
applications in as-yet-undecided forms. Binary EDNS labels are unrelated to
the current discussion, and are not appropriate as a single method of dealing
with these domain names (the label data-types still have to be defined for the
applications to use them consistently, regardless of which encoding is used to
represent them by any specific application).

> The standard STD13 DNS label, and any new long label are TEXT labels.
> They may only contain printable characters.

No. TXT and other RRs may have any octet as an owner domain name. DNS is not
limited to identifying hosts.

> - They must be normalised.

No. TXT should have the same basic capabilities, which is to specify an exact
sequence of character codes.

> As Eric said:
> - Minimum length of one UCS character code.
> - Maximum length of 63 UCS character codes.
>
> - Maximum cumulative length of 255 UCS character codes in a domain name.
>
> Then Eric goes into different types of labels: host name, ascii,
> mailbox and srv.
> While an application can have special rules, DNS cannot.

Note that the ASCII label data-type is specifically provided to support SRV. I
had thought about naming it SRV and restricting it to LDH with leading
underscore but thought that a generic printable-ASCII would be more useful.
This can be changed.

> For all labels in DNS (including host name and mailbox):
> - They must be case-insensitivly matched.
> - They must retain original form (not converted to lower case)
>   in DNS.

These are different data-types with different considerations.

Mailbox names must be case-preserved in order to satisfy protocol
dependencies, and are not used in lookups so normalization is not required.

Host names have no such dependencies. STD13 defines domain names as
case-neutral, and enforces this through case-neutral comparison operations on
the servers. However, making host names case-neutral will require a new ACE,
or will require that every possible case combination be delegated and managed
simultaneously. As such, case-neutral comparisons can still be performed, but
should be done at the resolver instead of at the server.

> Having the above set in place, applications can apply additional
> rules and they can change over time without the basic DNS
> workings having to be changed.
> The above rules gives DNS a simple clear foundation to stand on.
>
> If you need a binary label, define one.
>
> --
> For applications we can define the additional rules Eric has
> specified. But some of them I can see no reason to have:
>
> Host names:
> - Must be allowed to have mixed case. DNS must be allowed
>   to return host names containg upper case letters.
>   Otherwise software will break.

Making host names case-neutral will require a new ACE, or will require that
every possible case combination be delegated and managed simultaneously if
legacy systems are to provide lookup functions against the encoded names.

> - I can see no reason not to allow 1 character in minimum length.
>   Labels today have one character lengths.

The current delegation rules prohibit it. STD13 hostnames are currently a
subset of IDN and that should be preserved where possible. Essentially, this
change would mean ~"if all of the characters in the delegation are LDH, then
the minimum length is 2 characters, otherwise it is one character."

> Mailbox labels:
> - In DNS they are not case-sensitive. Some mail systems are said
>   to have them case-sensitive. Are there still some such ancient
>   systems left?
>   DNS must compare them case-insensitive, but return them
>   retaining original case.

There is no comparison on mailbox names. Mailbox names are not specified in
any queries. For RR data, they must be provided in case-sensitive form, but by
the same token they must also be non-normalized until the successor to 2822
says which normalization to use.

> What characters should be allowed in a label?
> Above I have defined it to be printable characters.
> Looking at how names are used, I would like to restrict
> it further. A name is often used as part of a text (for example
> in a manual or a web page). You then do not want the name
> to affect the formatting of the text. So you cannot allow
> anything in a name that affects direction, width, size, boldness, etc.
> So things like double width characters should not be allowed.
> This should probably be included in the definition of
> what is normalised text. Things like upper/lower case do not
> change the formatting of the text and can be used to enhance
> meaning or readability, and should be retained.

I am neutral on these issues. If there is a desire to exclude double-width
characters, then I will add it.