[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] proposals and deadlines




I'd like to clarify that he ACE-UTF8 coexistence protocol I am working on
will *not* be in tomorrow's pool. First of all, it requires the presence
of an ACE for backwards compatibility, and I want to wait for a decision
to be made on that first, without distracting from that process.
Secondarily, this is looking to be an extremely large and complex document
involving multiple delicate factors, and the science simply is not cooked
all of the way through as of yet. Thirdly, I just moved to the other side
of the country and have been without stable connectivity or systems for a
month, and its nowhere near completion. For all of those reasons, my
ACE-UTF8 coexistence protocol will not be in tomorrow's pool.

For those who care, here's an overview of the protocol as planned:

 1) Master files become UTF8.

 2) Servers convert and store ACE and UTF8 versions of IDNs together.

 3) Resolvers present two separate APIs: one for legacy names, one
    for IDNs.

 4) When an application calls the resolver, it uses the appropriate
    API as determined by the application protocol. EG, if an href=
    was encountered that had been encoded in ACE -- and if there
    were no external protocol demands from HTTP or the W3C or
    whoever -- then the legacy APIs would be used. Conversely,
    if the user entered an IDN into the URL input field and this
    were deemed legal through some other protocol that the browser
    was aware of (possibly an HTTP extension), then the extended
    API would be used.

 5) The UTF8 IDN resolver calls would generate messages with an
    EDNS extended label type. Legacy apps or resolvers would
    continue to use the legacy APIs and therefore legacy labels.

 6) Servers would answer EDNS-labelled queries with the raw UTF8
    data, and would answer the legacy queries with the ACE data.
    Any CNAME or PTR handling or any other labels would use this
    same rule (this means that it is important for clients to
    always use the extended API whenever they are allowed to do
    so according to protocol mandate).

 7) In those cases where the extended lookup failed (possibly due
    to a non-compliant server returning FORMERR or NOTIMP, or due
    to the resolver not supporting the extended API), the client
    would have to convert the UTF8 IDN to ACE, according to the
    mandate of the higher-layer protocol (EG, whatever the HTTP
    spec said to do whenever an IDN URL failed).

Those are the seven major elements to the protocol.

While this is not a proposal, a few rebuttals to "obvious" problems:

 A) Caches should not be a problem, as clients will have to be
    upgraded to adjust for application-specific UTF8 protocols,
    while nobody will be able to deploy UTF8 IDNs without using
    compatible servers and caches. IOW, as UTF8 IDNs are deployed
    the relevant infrastructure is upgraded along with it. This
    is a marginal cost increase over the upgrade burden for ACE
    usability (client transposition).

 B) The root server only needs to understand the extended label
    type, and will not need to store/use UTF8 until ICANN starts
    to assign IDN TLDs. The TLD servers will need to be upgraded
    to support IDN 3LDs, but this should happen enthusiastically
    in the relevant markets.

 C) The UTF8-to-ACE client fallback conversion should happen
    very infrequently. Protocols will have to support UTF8 IDNs
    themselves before *any* UTF8 IDNs are passed to resolvers,
    and applications will have to be upgraded for those protocols
    to work, and there will most likely have to be a way for a
    server to signify that it supports UTF8 explicitly (ESMTP
    extension, HTTP extension, etc.). For the immediate future,
    all of the queries will be ACE. Eventually, almost all of
    the queries will be UTF8. Fallback should be very rare.

 D) Label lengths are not increased. The hard limit of 255
    octets in the question section is unavoidable. Although it
    would be possible to lengthen the individual labels, it
    is not possible to change the maximum domain name length
    without changing the structure of the DNS message itself.
    This is possible (question section is null, or some other
    hack) but it would be disruptive.

    In fact, the use of EDNS extended label types reduces the
    IDN label length to a maximum of 62 characters.

 E) The coexistence "load" on the servers is unfortunate but it
    should not be preventative. For one thing, it will only 
    require 2x entries for every IDN, not for every domain name
    in every zone. Furthermore, the sooner this happens the
    better, since there will obviously be a smaller total number
    of entries this year than 10 years from now. By then, it
    should be possible to have made most systems speak UTF
    directly, such that support for ACE can be deprecated.

I've started scratching together the document, and am grateful for the
assistance of any interested parties.

Thanks