[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] complexity/simplicity: NAMEPREP code vs ACE codes
The algorithm for KC normalization is a couple of pages of code. It should
not be particularly difficult.
Mark
----- Original Message -----
From: "Soobok Lee" <lsb@postel.co.kr>
To: "Makoto Ishisone" <ishisone@sra.co.jp>
Cc: <idn@ops.ietf.org>
Sent: Thursday, June 28, 2001 08:25
Subject: Re: [idn] complexity/simplicity: NAMEPREP code vs ACE codes
> You are right.
> KC Norm is hard to learn and implement from scratch.
>
> To find more KC norm related sources in MDNkit.
> [root@bora lib]# wc *norm*c
> 632 1975 16581 normalizer.c
> 459 1710 12201 unormalize.c ( not related to KC norm ???)
> 1091 3685 28782 total
>
> If all of you think even huge mapping tables do not add complexity,
> My 'reorder_by_char_frequency-before-encode' idea
> adds no complexity to DUDE, as it adds only simple
> mapping functions and tables. that's not bad. :-)
>
> http://www.postel.co.kr/idn-lsb-00.txt
> (I am now adding SC/TW support to this. 15%~20% improvement measured
> by adding tables for most frequent 2048 han syllables).
>
> Soobok Lee
>
>
>
>
> ----- Original Message -----
> From: "Makoto Ishisone" <ishisone@sra.co.jp>
> To: <lsb@postel.co.kr>
> Cc: <idn@ops.ietf.org>
> Sent: Friday, June 29, 2001 12:01 AM
> Subject: Re: [idn] complexity/simplicity: NAMEPREP code vs ACE codes
>
>
> > In message <001e01c0ffd0$1a717ee0$ed1bd9d2@postel.co.kr>,
> > "Soobok Lee" <lsb@postel.co.kr> wrote:
> > > For whom had never looked into NAMEPREP codes in MDNkit of JPNIC,
> > > ...
> > > [root@bora lib]# wc name*[hc] uni*[hc]
> > > 296 1109 8554 nameprep.c
> > > 136 804 5475 nameprep_template.c
> > > 1694 11778 73804 nameprepdata.c
> > > 484 1822 12314 unicode.c
> > > 6806 38573 327222 unicodedata.c
> > > 9416 54086 427369 total
> >
> > If you look closer, you'll find that nameprepdata.c and unicodedata.c
> > above contain only data -- some large tables, which are generated from
> > NAMEPREP draft and Unicode Character Database. So I don't think it is
> > fair to count them when you compare complexity. On the other hand
> > you overlooked unormalize.c, which implements Unicode Normalization
> > Forms.
> >
> > Anyway I agree that NAMEPREP (NFKC in particular) is no simpler than
> > most of the proposed ACEs. Before implementing NFKC you have to read
> > the specification, which is longer than any ACE I-Ds, and relevant
> > documents, understand what's going on, and generate tables from the
> > data... Also I think it is harder to test the correctness of the
> > implementation.
> >
> > -- ishisone@sra.co.jp
> >
>