[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] proposals and deadlines
- To: IDN <idn@ops.ietf.org>
- Subject: [idn] proposals and deadlines
- From: "Eric A. Hall" <ehall@ehsco.com>
- Date: Thu, 12 Jul 2001 23:58:53 -0500
- Organization: EHS Company
I'd like to clarify that he ACE-UTF8 coexistence protocol I am working on
will *not* be in tomorrow's pool. First of all, it requires the presence
of an ACE for backwards compatibility, and I want to wait for a decision
to be made on that first, without distracting from that process.
Secondarily, this is looking to be an extremely large and complex document
involving multiple delicate factors, and the science simply is not cooked
all of the way through as of yet. Thirdly, I just moved to the other side
of the country and have been without stable connectivity or systems for a
month, and its nowhere near completion. For all of those reasons, my
ACE-UTF8 coexistence protocol will not be in tomorrow's pool.
For those who care, here's an overview of the protocol as planned:
1) Master files become UTF8.
2) Servers convert and store ACE and UTF8 versions of IDNs together.
3) Resolvers present two separate APIs: one for legacy names, one
for IDNs.
4) When an application calls the resolver, it uses the appropriate
API as determined by the application protocol. EG, if an href=
was encountered that had been encoded in ACE -- and if there
were no external protocol demands from HTTP or the W3C or
whoever -- then the legacy APIs would be used. Conversely,
if the user entered an IDN into the URL input field and this
were deemed legal through some other protocol that the browser
was aware of (possibly an HTTP extension), then the extended
API would be used.
5) The UTF8 IDN resolver calls would generate messages with an
EDNS extended label type. Legacy apps or resolvers would
continue to use the legacy APIs and therefore legacy labels.
6) Servers would answer EDNS-labelled queries with the raw UTF8
data, and would answer the legacy queries with the ACE data.
Any CNAME or PTR handling or any other labels would use this
same rule (this means that it is important for clients to
always use the extended API whenever they are allowed to do
so according to protocol mandate).
7) In those cases where the extended lookup failed (possibly due
to a non-compliant server returning FORMERR or NOTIMP, or due
to the resolver not supporting the extended API), the client
would have to convert the UTF8 IDN to ACE, according to the
mandate of the higher-layer protocol (EG, whatever the HTTP
spec said to do whenever an IDN URL failed).
Those are the seven major elements to the protocol.
While this is not a proposal, a few rebuttals to "obvious" problems:
A) Caches should not be a problem, as clients will have to be
upgraded to adjust for application-specific UTF8 protocols,
while nobody will be able to deploy UTF8 IDNs without using
compatible servers and caches. IOW, as UTF8 IDNs are deployed
the relevant infrastructure is upgraded along with it. This
is a marginal cost increase over the upgrade burden for ACE
usability (client transposition).
B) The root server only needs to understand the extended label
type, and will not need to store/use UTF8 until ICANN starts
to assign IDN TLDs. The TLD servers will need to be upgraded
to support IDN 3LDs, but this should happen enthusiastically
in the relevant markets.
C) The UTF8-to-ACE client fallback conversion should happen
very infrequently. Protocols will have to support UTF8 IDNs
themselves before *any* UTF8 IDNs are passed to resolvers,
and applications will have to be upgraded for those protocols
to work, and there will most likely have to be a way for a
server to signify that it supports UTF8 explicitly (ESMTP
extension, HTTP extension, etc.). For the immediate future,
all of the queries will be ACE. Eventually, almost all of
the queries will be UTF8. Fallback should be very rare.
D) Label lengths are not increased. The hard limit of 255
octets in the question section is unavoidable. Although it
would be possible to lengthen the individual labels, it
is not possible to change the maximum domain name length
without changing the structure of the DNS message itself.
This is possible (question section is null, or some other
hack) but it would be disruptive.
In fact, the use of EDNS extended label types reduces the
IDN label length to a maximum of 62 characters.
E) The coexistence "load" on the servers is unfortunate but it
should not be preventative. For one thing, it will only
require 2x entries for every IDN, not for every domain name
in every zone. Furthermore, the sooner this happens the
better, since there will obviously be a smaller total number
of entries this year than 10 years from now. By then, it
should be possible to have made most systems speak UTF
directly, such that support for ACE can be deprecated.
I've started scratching together the document, and am grateful for the
assistance of any interested parties.
Thanks