[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] question about cidnuc
Dear Mr. Hoffman:
thanks very much for your quick reply.
now i can understand how the compression works.
On Fri, 10 Mar 2000, Paul Hoffman / IMC wrote:
> >i made up two examples for the first two cases.
> >are they correct?
> >
> > 1) no compression: 0x0061 1100 1162
> > 2) compressed/one-octet header : 0x1100 1162 -> 0x 11 00 62
^^^^^^ i meant 'mode'. the same below.
> > 3) compressed/two-octet header: examples???
>
> In cidnuc, section 2.4.1, Step 1 says
> that all the upper octets *must* match in order to use the greater
> compression. In the case above, 0x00 does not match 0x11. Thus, the output
> of the compression step is 0xD8006111001162.
Since your doc. describes two modes for compressed string and
one case without compression (actually one octet explansion),
I thought there would be three cases.
I would say that your document could be improved for easier reading.
You could also include simple examples, which will help readers a lot.
Just my thought.
> The purpose is for long strings that might hit the 63-character
> limit after encoding with Base64. The script you gave, Hangul Jamo, is a
> prime example of where cidnuc's compression helps.
Below, I got somewhat confused ...
Sorry if this was discussed before.
(As I said, I am quite new to this mailing list
and I am trying to catch up ASAP...)
> In downcasing UTF8,
1. ?? I know UTF-8, but not "downcasing" UTF-8.
Could you please give me some reference for downcasing.
----
> In downcasing UTF8, the limit for Hangul Jamo is 8 characters;
2. for normal UTF8, one Hangul Jamo (two octects) wil become three octets.
Therefore, the limit seems 63 / 3 = 21. Am I right?
----
> in UTF-5, it is 15 characters;
3. I guess I can figure it out.
In UTF-5, one Hangul jamo (two octects) will become four octets.
Therefore, the limits seems floor (63/4)=15. Am I right?
> in cidnuc, it is 37 characters.
4. I am somewhat confused here.
In case of one-octet header, the limit seems 36 chars, not 37 chars.
Please corret me if I am wrong. My calculation is shown below:
1) let's assume we have 37 jamos (=74 octets).
2) after compression, we have 38 octets (due to 0x11 header).
3) after base32 encoding, we have
ceiling (38*8/5) = ceiling (60.8) = 61 octets.
4) after prepending "wg4", we have 64 octets, which exceeds 63 by one.
Therefore, the limit seems 36 chars.
----
5. In the document, it is said that
"the two-octet mode limits the number of chars to 17".
I am somewhat confused here too. The limit seems 18, not 17.
Please corret me if I am wrong. My calculation is shown below:
1) let's assume we have 18 chars (=36 octets).
2) after compression, we have 37 octets (due to 0xd8 header).
3) after base32 encoding, we have
ceiling (37*8/5) = ceiling (59.2) = 60 octets.
4) after prepending "wg4", we have 63 octets.
Therefore, the limit seems 18 chars.
> --Paul Hoffman, Director
> --Internet Mail Consortium
Thanks very much.
±è °æ¼®, ºÎ»ê´ë Á¤º¸ ÄÄÇ»ÅÍ °øÇкÎ;
KIM Kyongsok/GIM Gyeongseog, Busan National Univ.
gimgs@hangeul.cs.pusan.ac.kr, http://hangeul.cs.pusan.ac.kr/hangeul/
Ph: +82-(0)51-510-2292, Fax: +82-(0)51-515-2208