[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] ACE37
From: "Adam M. Costello" <amc@cs.berkeley.edu>
> I was basing it on 59 octets. In the worst case, DUDE would require
> 60 octets for 15 Han characters, but the worst case virtually never
> happens. In the thousands of names that JPNIC tried, it never happened.
> 59 octets happened a few times, but I think the typical length was 58 or
> 57. So DUDE can almost always support 15 characters, but usually cannot
> support 16. I think the boundary between "usually fits" and "usually
> fails" is more useful than the "always fits" number.
I dont think this is very true for Chinese. Most of the time, characters
would come from different rows causing DUDE to have to use 4 characters for
each Chinese ideograph. In anycase, if you compare "usually fits" then
ACE37 should still out perform DUDE by a magnitude of at least 5 codepoints,
cause 19 codepoints is the "worst case" scenario (with the 59-octet limit),
but it is likely that it would "usually fit" 20-21 codepoints. As in my
example with JPNIC in Japanese, it has 25 (>21) codepoints and ACE37 can
still handle it, whereas DUDE fails.
I think the point I am trying to make is that ACE37 greatly increases the
capacity for CJK ideographs, which has been identified as an undeniable
concern through the discussion on this mailing list, without requiring a
complex algorithm and utilizes the simplicity of DUDE.
Edmon