[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Just send UTF-8 with nameprep




"Adam M. Costello" wrote:
> 
> "Eric A. Hall" <ehall@ehsco.com> wrote:
> 
> >  a) There will be some requirement for the UTF-8 namespace to provide
> >     manually-encoded ACE names directly.  If we are converting
> >     UTF-8 to ACE, these fixed names are pretty easy to implement.
> >     Conversely, if we are doing ACE to UTF-8, then we have to come up
> >     with a way to prevent the ACE name from being converted to its
> >     UTF-8 equivalent in order for the ACE encoding to be preserved in
> >     the UTF-8 namespace.
> 
> There is only one namespace.  Every name in that single namespace has
> both an ASCII representation and a UTF-8 representation (which are
> identical in the case of ASCII names).

You are correct in that there is only one DNS namespace. Using that term
was a bad idea, and as much as I complain about conflicting metaphors, I
should have known better.

> Every protocol, message format,
> file format, etc. is free to require the ASCII representation or to
> allow both representations.  If both representations are allowed, then
> they must be treated the same by the receiver, because they are the
> same name.  There is no need to tag some names as must-use-ACE.  Are
> you trying to support names that really do begin with the ACE signature
> prefix, so that the user sees the ACE prefix even inside IDN-aware
> applications?  Such names are forbidden; they simply do not exist in
> the namespace.  Trying to support them would be much more trouble than
> it's worth.

Well, since you brought it up, this is my current thinking:

Dual-mode servers would flag specific RRs as having legacy or IDN names,
or having both flags. Some legacy domain names will look like UTF-8 names
(any eight-bit binary label is legal), and so there must be a way to flag
specific entries as being either a legacy name or an IDN (or both).

Any domain name which is entered as an IDN would need to have the IDN flag
enabled, but must not have the legacy flag enabled. These IDNs must only
be used to answer IDN queries, and must not be processed through legacy
query paths, as they would likely be used to answer queries being issued
by older applications in that scenario. We must not allow UTF-8 IDNs to to
be processed by older applications, since they may be issuing queries
based on a local encoding (such as 8859-n or windows-nnnn or MacRoman or
whatever). Using these IDNs to answer legacy queries will absolutely
definitely cause some kind of unintended consequence, and we will get a
lot of heat for the failure.

ACE domain names which are automatically created from the UTF-8 names
should only have the legacy flag enabled, to prevent bleed-over in the
opposite direction. It is also feasible to allow these entries to exist
with both flags, but see below.

It should also be possible to have an ACE name with both flags by allowing
users to enter it manually, although this should be cautioned against (but
not prohibited). We don't want applications that always re-apply ACE to an
IDN to do that, but moreover we don't want to make it common or simple to
make that mistake (although in truth, it might be more effective to let
people shoot themselves in the feet to really drive this message home).

For things like Dynamic DNS, this mechanism is actually pretty simple:
what message format was used for the registered name? Legacy names get the
legacy flag, while IDNs get the IDN flag (while the ACE conversions get
the legacy flag). For human operators, it is less clear. Perhaps one
product will allow users to specify the datatype on entry, while another
will simply store all names as IDNs and automatically create a legacy
version of the zone whenever the IDN zone is [re]loaded (with the caveat
that such a mechanism will break any non-IDN eight-bit labels). UI stuff
is probably outside of the scope, but some discussion would be fruitful as
it would likely result in a better management method than the one
presented here.

It would be a lot simpler to simply declare non-ASCII labels as IDNs, but
I don't think this is feasible. Perhaps it is?

> > Ping should certainly be allowed to use the UTF-8 encoding on its
> > command line, for example.
> 
> This is not a protocol issue, but a user-interface issue, and operating
> systems have their ways of dealing with it.  For example, under UNIX,
> commands should assume that their command-line arguments use the
> encoding indicated LC_ALL or LANG.

Good approach.

> > Hostnames in simple apps (like POP3 server) could also use UTF-8,
> > even  though the POP3 protocol doesn't address this.
> 
> I don't know what you mean by the POP3 protocol not addressing this.

*hostnames* for POP3 servers are not specifed by the protocol. You have to
tell your POP3 mailer where it should go to pickup mail; that hostname is
what I'm referring to.

Also, from the quoted message:

 | It might be prudent to simply say that "A lookups MAY default to
 | UTF-8 if the operating environment allows it, but the environment
 | SHOULD explicitly indicate that it supports this usage."

This should be the general rule for all lookups (not just A RRs). In the
case of application-specific queries, the application must be sure to only
use domain names which are legal for that application. IOW, if an
application is presented with an IDN but the protocol specifications do
not allow for the use of IDNs, then the application should down-convert
the IDN and use that (or return an error, or whatever is appropriate).

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/