[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] length restrictions on IDN label
2002-10-15 07:22, Adam M. Costello :
> Erik Nordmark <Erik.Nordmark@sun.com> wrote:
>
> > > an internationalized label can represent at most 63 code points,
> > > whether it's ACE or not. A given encoding uses a bounded number of
> > > octets per code point, so you can allocate your buffers based on
> > > that.
> >
> > 63 code points is presumably a conservative number. Given the 4 octet
> > ACE prefix you can only fit a 59 octets worth of punycode output
> > per label, hence presumably 59 code points is a tighter limit for
> > non-ASCII internationalized labels while 63 code points is the limit
> > for ASCII labels.
>
> True, but which limit you care about depends on the encoding. For
> example, if you're using UTF-32, then a regular ASCII label can have 63
> code points each occupying 4 octets.
YES.
>
> Soobok Lee <lsb@postel.co.kr> wrote:
>
> > IDNA section 6.1 goes further than that by allowing _protocols_ to use
> > non-ACE labels which are not presentation forms nor textual labels,
> > but protocol elements. What if future ESMTP allows utf8 encodings in
> > RCPT: headers ?
>
> Then applications that implement future ESMTP will need to be prepared
> for UTF-8 labels to contain more than 63 octets. This is not a
My focus was on whether the "UTF-8 labels in ESMTP sessions" are
legitimate internationalized hostnames (labels) or not at the future
and at the present time, since IDNA section 6.1 allows utf8 encodings
of transmitted labels. Does this section seems to propose changes in
hostname rules ? But, you and other authors denied that. Authors's real
intent alway wins over other's interpretations (mine) in the draft. ;-)
Don't you think you authors be more clear about that section?
> problem, because any application that can even think about using
> non-ASCII labels is aware of IDNA, and therefore knows the definition of
> internationalized label, and therefore knows that the maximum possible
> label length depends on the encoding used.
But label length restrictions are not added per application prptocol
basis, rather they should be added at lower level like the DNS (and so
IDNA )
as has been done at RFC1035. How do you think about that?
>
> Soobok Lee <lsb@postel.co.kr> wrote:
>
> > They will find an utf8 label may have 168 octets, contrary to RFC1035.
>
> There is no contradiction. RFC 1035 says nothing about UTF-8 labels.
> The RFC 1035 limit of 63 octets per label applies to the universe of
> labels that RFC 1035 defined. IDNA defines some new labels outside that
> universe (each of which is equivalent to a label inside that universe,
> for backward compatibility). If you want to know the maximum possible
> length of these new labels that were created by IDNA, don't bother
> looking at RFC 1035, because it can't possibly tell you, because it
> doesn't even know about the new labels. Look at IDNA, which contains
> the complete definition of internationalized label.
See My first above paragraph.
>
> > When IDNA draft granted utf8 label use in application protocols,
> > it is natural that it should have also specified utf8 label length
> > restrictions.
>
> It did, by defining internationalized label as anything that ToASCII
> can be applied to without failing. From this you can easily conclude
> that internationalized labels, when encoded in UTF-8, can exceed 63
> octets, but cannot exceed 63*4 octets. A tight upper bound is trickier
> to figure out, but you don't need it in practice.
I prefer Erik's suggestion for explicit clarifications, rather than
implicit ones.
>
> > So, 1024 or 768 bytes are good. But those utf8 FQDN cannot be put
> > into single UDP packet of DNS response/query. This will constrain
> > future DNS protocol update efforts around utf8 supports in wire
> > format. Today's long iDNs may be one of the obstacles in the way to
> > the effort.
>
> That will indeed be an issue that any UTF-8 DNS protocol update will
> need to address.
Sure.
Regards,
Soobok Lee