[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] complexity/simplicity: NAMEPREP code vs ACE codes
- To: "Makoto Ishisone" <ishisone@sra.co.jp>
- Subject: Re: [idn] complexity/simplicity: NAMEPREP code vs ACE codes
- From: "Soobok Lee" <lsb@postel.co.kr>
- Date: Fri, 29 Jun 2001 00:25:25 +0900
- Cc: <idn@ops.ietf.org>
- Delivery-date: Thu, 28 Jun 2001 08:20:45 -0700
- Envelope-to: idn-data@psg.com
You are right.
KC Norm is hard to learn and implement from scratch.
To find more KC norm related sources in MDNkit.
[root@bora lib]# wc *norm*c
632 1975 16581 normalizer.c
459 1710 12201 unormalize.c ( not related to KC norm ???)
1091 3685 28782 total
If all of you think even huge mapping tables do not add complexity,
My 'reorder_by_char_frequency-before-encode' idea
adds no complexity to DUDE, as it adds only simple
mapping functions and tables. that's not bad. :-)
http://www.postel.co.kr/idn-lsb-00.txt
(I am now adding SC/TW support to this. 15%~20% improvement measured
by adding tables for most frequent 2048 han syllables).
Soobok Lee
----- Original Message -----
From: "Makoto Ishisone" <ishisone@sra.co.jp>
To: <lsb@postel.co.kr>
Cc: <idn@ops.ietf.org>
Sent: Friday, June 29, 2001 12:01 AM
Subject: Re: [idn] complexity/simplicity: NAMEPREP code vs ACE codes
> In message <001e01c0ffd0$1a717ee0$ed1bd9d2@postel.co.kr>,
> "Soobok Lee" <lsb@postel.co.kr> wrote:
> > For whom had never looked into NAMEPREP codes in MDNkit of JPNIC,
> > ...
> > [root@bora lib]# wc name*[hc] uni*[hc]
> > 296 1109 8554 nameprep.c
> > 136 804 5475 nameprep_template.c
> > 1694 11778 73804 nameprepdata.c
> > 484 1822 12314 unicode.c
> > 6806 38573 327222 unicodedata.c
> > 9416 54086 427369 total
>
> If you look closer, you'll find that nameprepdata.c and unicodedata.c
> above contain only data -- some large tables, which are generated from
> NAMEPREP draft and Unicode Character Database. So I don't think it is
> fair to count them when you compare complexity. On the other hand
> you overlooked unormalize.c, which implements Unicode Normalization
> Forms.
>
> Anyway I agree that NAMEPREP (NFKC in particular) is no simpler than
> most of the proposed ACEs. Before implementing NFKC you have to read
> the specification, which is longer than any ACE I-Ds, and relevant
> documents, understand what's going on, and generate tables from the
> data... Also I think it is harder to test the correctness of the
> implementation.
>
> -- ishisone@sra.co.jp
>