[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] complexity/simplicity: NAMEPREP code vs ACE codes



The algorithm for KC normalization is a couple of pages of code. It should
not be particularly difficult.

Mark

----- Original Message -----
From: "Soobok Lee" <lsb@postel.co.kr>
To: "Makoto Ishisone" <ishisone@sra.co.jp>
Cc: <idn@ops.ietf.org>
Sent: Thursday, June 28, 2001 08:25
Subject: Re: [idn] complexity/simplicity: NAMEPREP code vs ACE codes


> You are right.
> KC Norm is hard to learn and implement from scratch.
>
> To find more KC norm related sources in MDNkit.
> [root@bora lib]# wc *norm*c
>     632    1975   16581 normalizer.c
>     459    1710   12201 unormalize.c ( not related to KC norm ???)
>    1091    3685   28782 total
>
> If all of you think even huge mapping tables do not add complexity,
> My 'reorder_by_char_frequency-before-encode' idea
> adds no complexity to DUDE, as it adds only simple
> mapping functions and tables.  that's not bad.    :-)
>
> http://www.postel.co.kr/idn-lsb-00.txt
> (I am now adding SC/TW support to this. 15%~20% improvement measured
> by adding tables for most frequent 2048 han syllables).
>
> Soobok Lee
>
>
>
>
> ----- Original Message -----
> From: "Makoto Ishisone" <ishisone@sra.co.jp>
> To: <lsb@postel.co.kr>
> Cc: <idn@ops.ietf.org>
> Sent: Friday, June 29, 2001 12:01 AM
> Subject: Re: [idn] complexity/simplicity: NAMEPREP code vs ACE codes
>
>
> > In message <001e01c0ffd0$1a717ee0$ed1bd9d2@postel.co.kr>,
> > "Soobok Lee" <lsb@postel.co.kr> wrote:
> > > For whom had never looked into NAMEPREP codes in MDNkit of JPNIC,
> > >  ...
> > > [root@bora lib]# wc name*[hc] uni*[hc]
> > >     296    1109    8554 nameprep.c
> > >     136     804    5475 nameprep_template.c
> > >    1694   11778   73804 nameprepdata.c
> > >     484    1822   12314 unicode.c
> > >    6806   38573  327222 unicodedata.c
> > >    9416   54086  427369 total
> >
> > If you look closer, you'll find that nameprepdata.c and unicodedata.c
> > above contain only data -- some large tables, which are generated from
> > NAMEPREP draft and Unicode Character Database.  So I don't think it is
> > fair to count them when you compare complexity.  On the other hand
> > you overlooked unormalize.c, which implements Unicode Normalization
> > Forms.
> >
> > Anyway I agree that NAMEPREP (NFKC in particular) is no simpler than
> > most of the proposed ACEs.  Before implementing NFKC you have to read
> > the specification, which is longer than any ACE I-Ds, and relevant
> > documents, understand what's going on, and generate tables from the
> > data...  Also I think it is harder to test the correctness of the
> > implementation.
> >
> > -- ishisone@sra.co.jp
> >
>