[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Re: An idn protocol for consideration in making therequirements
"Martin J. Duerst" wrote:
> Also, one potential failure from the old days is that the eigth bit
> is lost. I would like to do some checks on how many of the currently
> registered domain names could be interpreted as legal UTF-8 names
> that had their 8th bit taken off (other than the trivial identity
> case, which is of course UTF-8). If somebody can point me to some
> data, or tell me how to get at it, or otherwise collaborate on this,
> please tell me.
I am also to interested know how many UTF-8 domain name will become a
currently registered domain name if their eighth bit are been striped. I can
forsee people asking "Why is my XYZ.com going to sex.com?" :-)
Not every network/protocol are 8 bits clean. For example, RFC821 implied that
the mail headers should have their 8-bit striped off. See below.
-James Seng
Maynard Kang wrote:
James,
Spent tonight going through the e-mail RFCs in detail.
An interesting thing I noticed; since RFC 821 has never been superseded by
any other RFC, I tried to examine RFC 821 to look for specifications
regarding character set restriction. Apparently, in the words of the
original RFC 821:
"Commands and replies are composed of characters from the ASCII
character set [1]. When the transport service provides an 8-bit byte
(octet) transmission channel, each 7-bit character is transmitted
right justified in an octet with the high order bit cleared to zero."
[snip]
...RFC 821 explains how 7-bit characters should be represented in 8-bit
environment (pad the high bit with 0)...
[snip]