[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] WG last call documents




[this is part 2 of a multi-part response]

"Adam M. Costello" wrote:

> "Eric A. Hall" <ehall@ehsco.com> wrote:
> 
> >  | Stringprep Profile for Internationalized Host Names
> >
> > Since this profile is specifically for host names, all of the rules
> > that apply to hostnames should be placed here.
> 
> I think the intended title of this document is "Stringprep Profile
> for Internationalized Domain Names".

Based on a cursory analysis, this document is suitable as a standalone
description of i18n domain names, although I would reiterate my belief
that there is room to provide dot-separator character mapping on full
domain names (possibly belonging in a separate super-profile).

However, we still need a separate profile that describes the hostname
rules, in particular including a prohibition against certain leading and
trailing characters, as was discussed at length on the mailing list on
multiple occassions.

> Those apps still need the IDNA spec, because the IDNA spec defines
> what is a valid internationalized domain label.  Nameprep alone cannot
> possibly tell you whether the label is too long; you can know that only
> after applying Punycode.

This co-dependency needs to be called out at the top of the i18n domain
name profile.

> > "generic domain name slot" should be substituted with "STD13 slot".

> Regardless of what we call it, that is the concept that is needed in
> rule 1.  Perhaps you think the term "generic domain name slot" is
> counter-intuitive and should be renamed (but not redefined)?
> 
> I don't think the term "STD13 slot" is appropriate.  I would expect a
> term called "STD13 slot" to be defined in terms of STD13 only, but the
> concept we need here is defined in terms of the IDNA spec.

The use of "generic" is what bothers me, because it implies that the slot
(another word I dislike) can hold any domain name. I would call it STD13
because that is the governing specification (actually, "RFC1035 slot"
would be better since RFC1035 is likely to be supplanted from STD13 at
least a couple of times in our lifetimes).

> > Trying to pipe an STD13 binary domain name through ToUnicode may
> > result in some false-positives.

> You don't need to invoke "binary" domain names to illustrate this
> problem, since it is equally present for ASCII domain names. In either
> case, an accidental ACE label might get displayed (or even transported)
> in an untintended way, but no information will be lost, and the
> original form will be restored before any program attempts to
> interpret the name as intended.

The situation I am thinking about is where one of the NUL replacements
takes place. I admit that I haven't fully explored the possibility of
weaknesses in this area, but deleting input is always risky so I am
keeping it in my mind until I do get around to it.

> >  | B. Design philosophy
> >
> > What value does this section provide?
> 
> It documents the rationale behind the design decisions.  I think it's
> often helpful and/or instructive for people who later read the spec
> and wonder "Why did they do it that way?"

So you have an example of another WG specification where only half the
opinions are provided, and that they provide value towards understanding
the protocol or service? Citations Please

The fact of the matter is that this section is one side of a holy war,
argumentative, does not provide any scientific value, and certainly does
not represent WG consensus. It is also factually in error.

> Has the current prohibition of leading & trailing hyphens been terribly
> helpful?  I think it probably would have been better if the host
> label syntax were simpler--just a set of allowed characters, with no
> restrictions on position.  So I think we should keep any new positional
> restrictions to a minimum.
> 
> But there is one class of characters that might indeed be dreadful at
> the beginning: combining characters.  I recently refered to labels
> that begin with combining characters as invalid Unicode strings, but
> they're not, are they?  They just behave in surprising ways when
> abutted with something else.  Maybe nameprep should prohibit initial
> combining characters.

Note that these are rules for hostnames in particular.

A prohibition against combining characters is not feasible. See
<200111212258.OAA23757@birdie.sybase.com> from K Whistler.

This was resolved without disconsent in <3C0DD610.F50F947B@ehsco.com>:

 | First and last characters in the label MUST NOT be a diacritical
 | mark or hyphen-minus.

[Several other issues with stringprep were not yet addressed, and I still
consider them as unresolved and/or open.]

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/