[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDN refocus, v4



> The encoded values being passed around are supposed to -decode- as UCS
> character codes, so it would seem best to maintain consistency across
all
> outputs. We do not want to make it more complex than necessary and
> defining multiple extraction profiles would make processing much more
> complex: "we promise that this encoding will represent a series of UCS
> characters" should be it. For that reason, my opinion is that any UCS
> character code (including those not yet assigned) should be valid for
> internationalized domain names (with nameprep providing the host name
> subset filter). Going that route incurs a responsibility to tell
> implementations that they have to be careful with data they process,
but
> in truth we had that responsibility already given the broader exposure
of
> the combining characters.

Right! As I said, we are not disagreeing :-) But I think it is useful to
continue this discussion as I think it is leading somewhere.

Definition:

domain names - any 8-bit characters, usually (but not neccessary)
US-ASCII

host names - limited to LDH, no leading or trailing "-", delimited by
".", cannot contain have all digits-only labels

What is a definiton of internationalized domain names and
internationalized host names?

We seem to agree at least that

i18n domain names - any unicode characters

i18n host names - i18n domain names subjected to prohibited list defined
in Section 5 of nameprep (host names limitation, space characters,
control characters, private use, etc)

but is i18n host name sufficient for normal use? as technical
implementation, maybe. for policy implementation, unlikely. perhaps we
need a new term for that...

> I would submit that we are describing transfer encodings and their
> handling. The application media being used to transfer the encoded
values
> provide seven- and eight-bit paths. For the sake of maximum efficiency
> with the applications that transfer and use the domain names, we
should
> provide seven- and eight-bit encodings. The encapsulation constraints
in
> DNS are difficult to work with but that does not change the above. We
> should not be defining mandatory seven-bit encodings for eight-bit
> applications especially if they are compliant with BCP18 for every
unit of
> protocol and/or application data.

We differ in this.

The question if we need more than one CCS have been answered long time
back. The choice clearly ISO/IEC 10646.

TES is an encoding. CES is an encoding. I am asking if we need more than
one encoding (either TES or CES). That is my first question.

If the answer is that we need more than one encoding, then the next
question would be how many separate cases do we have, ie, how many
encodings do we need? (I could argue it is more "fair" to use UTF-32 in
EDNS labels)

Then we can start asking which is the appropriate encoding for each
case, ie, your question 2 and 3.

-James Seng