[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Change request for cidnuc
Hello,
Please consider the following suggestions for improvement to CIDNUC.
--
Rather than "wg4", I suggest the more distinctive "--" preceded by a single
letter "a" to "z". Currently "a" to "c" to be used and indicate which form of
CIDNUC. To allow future proofing, letters "d" to "z" are reserved for
potential later use.
--
Currently CIDNUC has considered compact encoding for Asian and for scripts like
Cyrillic and Greek. However, for accented Latin the compression is poor. This
change request addresses the problem (and allows Latin labels to be up to
(63-3)/2 = 30 letters long).
--
A label or username can be encoded in one of four ways. Considering the two
octets of the string in UTF-16 and using the notation that L10 is the lowest 10
bits and L8 is the lowest 8 bits and H8 is the highest 8 bits:
1) if string is only a-z 0-9 and hyphen then no encoding applied
2) else if all high octets are 0x01 0x02 or 0x03 (e.g. string is Latin
supplement/extended-A/etc), then encode as follows:
"c--" base32(L10 L10 L10 ...)
3) else if all high octets are equal (e.g. string Greek/Cyrillic/etc), then
encode as follows:
"b--" base32(H8 L8 L8 L8 ...)
4) else (e.g. Asian/etc), encode as follows:
"a--" base32(H8 L8 H8 L8 H8 L8 ...)
--
Base32 conversion
bits char hex bits char hex
00000 9 0x61 10000 p 0x71
00001 a 0x62 10001 q 0x72
00010 b 0x63 10010 r 0x73
00011 c 0x64 10011 s 0x74
00100 d 0x65 10100 t 0x75
00101 e 0x66 10101 u 0x76
00110 f 0x67 10110 v 0x77
00111 g 0x68 10111 w 0x78
01000 h 0x69 11000 x 0x79
01001 i 0x6a 11001 y 0x7a
01010 j 0x6b 11010 z 0x32
01011 k 0x6c 11011 2 0x33
01100 l 0x6d 11100 3 0x34
01101 m 0x6e 11101 4 0x35
01110 n 0x6f 11110 5 0x36
01111 o 0x70 11111 6 0x37
(0 and 1 never to be used. 7 and 8 and - reserved for possible future use.)
--
Example for encoding 2:
d{"u}rst@w3.org
c--cdg3crcsct@w3.org
(or it could equally be written: c--cDg3cRcScT@w3.org)
Another example for encoding 2:
www.tre-feli{^c}a.ie
www.c--ctcrceamcfceclcihica.ie
--
regards,
Aaron Irvine
--
-----------------------------------------------------
Aaron Irvine
mailto:airvine@corp.phone.com
-----------------------------------------------------