[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Change request for cidnuc
James, or Marc:
Are we now discussing implementations, or are we still focused on agreeing
to the final presentation in Adelaide of the requirements? this reads like
an implementation discussion, which I thought was not appropriate for this
WG, so I'm trying to get my bearings...
Thanks,
Bill Semich
.NU Domain
At 12:50 PM 3/13/00 +0000, Aaron Irvine wrote:
>Hello,
>
>Please consider the following suggestions for improvement to CIDNUC.
>
>--
>
>Rather than "wg4", I suggest the more distinctive "--" preceded by a single
>letter "a" to "z". Currently "a" to "c" to be used and indicate which
form of
>CIDNUC. To allow future proofing, letters "d" to "z" are reserved for
>potential later use.
>
>--
>
>Currently CIDNUC has considered compact encoding for Asian and for scripts
like
>Cyrillic and Greek. However, for accented Latin the compression is poor.
This
>change request addresses the problem (and allows Latin labels to be up to
>(63-3)/2 = 30 letters long).
>
>--
>
>A label or username can be encoded in one of four ways. Considering the two
>octets of the string in UTF-16 and using the notation that L10 is the
lowest 10
>bits and L8 is the lowest 8 bits and H8 is the highest 8 bits:
>
>1) if string is only a-z 0-9 and hyphen then no encoding applied
>
>2) else if all high octets are 0x01 0x02 or 0x03 (e.g. string is Latin
>supplement/extended-A/etc), then encode as follows:
> "c--" base32(L10 L10 L10 ...)
>
>3) else if all high octets are equal (e.g. string Greek/Cyrillic/etc), then
>encode as follows:
> "b--" base32(H8 L8 L8 L8 ...)
>
>4) else (e.g. Asian/etc), encode as follows:
> "a--" base32(H8 L8 H8 L8 H8 L8 ...)
>
>--
>
> Base32 conversion
> bits char hex bits char hex
> 00000 9 0x61 10000 p 0x71
> 00001 a 0x62 10001 q 0x72
> 00010 b 0x63 10010 r 0x73
> 00011 c 0x64 10011 s 0x74
> 00100 d 0x65 10100 t 0x75
> 00101 e 0x66 10101 u 0x76
> 00110 f 0x67 10110 v 0x77
> 00111 g 0x68 10111 w 0x78
> 01000 h 0x69 11000 x 0x79
> 01001 i 0x6a 11001 y 0x7a
> 01010 j 0x6b 11010 z 0x32
> 01011 k 0x6c 11011 2 0x33
> 01100 l 0x6d 11100 3 0x34
> 01101 m 0x6e 11101 4 0x35
> 01110 n 0x6f 11110 5 0x36
> 01111 o 0x70 11111 6 0x37
>
>(0 and 1 never to be used. 7 and 8 and - reserved for possible future use.)
>
>--
>
>Example for encoding 2:
>
>d{"u}rst@w3.org
>
>c--cdg3crcsct@w3.org
>
>(or it could equally be written: c--cDg3cRcScT@w3.org)
>
>
>Another example for encoding 2:
>
>www.tre-feli{^c}a.ie
>
>www.c--ctcrceamcfceclcihica.ie
>
>
>
>--
>
>regards,
>Aaron Irvine
>
>--
>
>-----------------------------------------------------
>Aaron Irvine
> mailto:airvine@corp.phone.com
>-----------------------------------------------------
>
>
>
>
Bill Semich
President and Founder
.NU Domain Ltd
http://whats.nu
bill@mail.nic.nu