[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Challenge: longest UTF-8 with valid domain name



Martin Duerst <duerst@w3.org> wrote:

> What is the longest (in terms of bytes) internationalized domain name
> (when encoded as UTF-8)?
>
> Obviously because there are characters that are ignored by nameprep,
> we would have to ask for the longest one after ToUnicode.

You mean the longest after nameprep.  The output of ToUnicode is not
necessarily nameprepped (because ToUnicode leaves its input untouched if
the input is not an ACE).

> The problem applies both to single labels as well as to FQDNs.

I think the longest nameprepped label is 224 bytes in UTF-8.  Taken any
code point in the range 10000..55931 (hex) and repeat it 56 times.  The
ACE form will be 63 characters.  The nameprepped non-ACE form in UTF-8
will be 56*4 = 224 bytes.

As for the longest IDN, I think that would have four labels, three
of which are maximal, and one of which is just shy of maximal (62
characters in the ACE).  With the four length bytes, that hits the DNS
limit of 255 bytes.

The longest UTF-8 representation of that name (with nameprepped labels)
would use ideographic full stops for dots, and would include the
trailing dot.  So it would be (3*56+55)*4 + 4*3 = 904 bytes.

AMC