[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] complexity/simplicity: NAMEPREP code vs ACE codes



For whom had never looked into NAMEPREP codes in MDNkit of JPNIC,

 
[root@bora lib]# pwd
/home/lsb/mdn/mdnkit-2.0.1-src/lib

[root@bora lib]# wc name*[hc] uni*[hc]
    296    1109    8554 nameprep.c
    136     804    5475 nameprep_template.c
   1694   11778   73804 nameprepdata.c
    484    1822   12314 unicode.c
   6806   38573  327222 unicodedata.c
   9416   54086  427369 total

[root@bora lib]# ls -l *ace*c dud*c
-rw-r--r--   1 3364     wheel        7968 Mar  7 09:58 ace.c
-rw-r--r--   1 3364     wheel       19370 Apr  4 15:11 amcacem.c
-rw-r--r--   1 3364     wheel       11963 Mar 30 11:35 amcacer.c
-rw-r--r--   1 3364     wheel       15803 Mar  7 18:34 brace.c
-rw-r--r--   1 3364     wheel        7892 Mar 26 10:45 dude.c
-rw-r--r--   1 3364     wheel       11671 Mar 28 15:37 lace.c
-rw-r--r--   1 3364     wheel       11826 Mar  7 18:34 race.c

[root@bora lib]# wc  *ace*c dud*c
    291    1186    7968 ace.c
    829    3024   19370 amcacem.c
    496    1751   11963 amcacer.c
    666    2331   15803 brace.c
    448    1837   11671 lace.c
    462    1842   11826 race.c
    313    1224    7892 dude.c
   3505   13195   86493 total



----- Original Message ----- 
From: "Mark Davis" <mark@macchiato.com>
To: "Soobok Lee" <lsb@postel.co.kr>; "Kenneth Whistler" <kenw@sybase.com>; "Marc Blanchet" <Marc.Blanchet@viagenie.qc.ca>
Cc: <idn@ops.ietf.org>
Sent: Thursday, June 28, 2001 7:21 AM
Subject: Re: [idn] Report from the ACE design team


> Normalization does take some data files, but it does not take thousands of
> lines of code. See
> http://www.macchiato.com/unicode/normalization_footprint.htm for more
> information.
> 
> Mark
> 
> ----- Original Message -----
> From: "Soobok Lee" <lsb@postel.co.kr>
> To: "Kenneth Whistler" <kenw@sybase.com>; "Marc Blanchet"
> <Marc.Blanchet@viagenie.qc.ca>
> Cc: <idn@ops.ietf.org>
> Sent: Wednesday, June 27, 2001 14:22
> Subject: Re: [idn] Report from the ACE design team
> 
> 
> > In IDNA, all domain labels are NAMEPREPed and then ACEed.
> >
> > If any one would ever look into the  NAMEPREP implementations,
> > he/she could not understand why we adhere to  simple ACEs.
> >
> > NAMEPREP already has HUGE unicode mapping tables and
> > long lists of prohibited characters. It has  KC normalization
> > code in it. Thousands lines of code for all.
> >
> > Moreover, NAMEPREP and ACE may be coded into
> > a single IDNA library by application vendors.
> > Choosing the simplest ACE (and saving  only one hundred
> > lines of code)  does not greatly simplify  IDNA implementations and
> > so lose its justification
> > as long as we have still complex NAMEPREP process paired with ACE.
> >
> > Soobok Lee, lsb@postel.co.kr
> >
> >
> > ----- Original Message -----
> > From: "Marc Blanchet" <Marc.Blanchet@viagenie.qc.ca>
> > To: "Kenneth Whistler" <kenw@sybase.com>
> > Cc: <idn@ops.ietf.org>
> > Sent: Wednesday, June 27, 2001 11:20 PM
> > Subject: Re: [idn] Report from the ACE design team
> >
> >
> > > At/?20:18 2001-06-26 -0700, Kenneth Whistler you wrote/vous ?riviez:
> > > >Soobok Lee wrote:
> > > >
> > > > > I believe long natural sentence/phrase domains escape your
> arguments.
> > > > >
> > > > > In CJK, "What is the nearest macdonald hamberger shop from
> > > > here?".(kr|cn|jp|tw)
> > > >
> > > >??
> > > >
> > > >Why would someone want to type in:
> > > >
> > >
> >"konomawariniwaichibanchikaimakudonarudohambaaganomisewadokodeshooka.co.jp"
> > >
> > >
> > > Sorry Ken, but your argument of long-names-will-not-be-used does not
> stand.
> > > In the .com zone file, there is _already_ some labels which are 63
> > > characters long.  So even in english, there is a use (that one can
> > > question, but...) of all space available.  The idn tax that give us ~ 20
> > > chars max for some scripts is already a big tax compared to what is
> > > available and _used_ in all ascii labels.  To me, the best compression
> we
> > > can get is the best, (without necessarily forgetting the complexity).
> > >
> > > Marc.
> > >
> > >
> > >
> > >
> > > >when they could go to
> > > >
> > > >mcdonalds.co.jp
> > > >
> > > >click on the "zenkoku no mise listo" (List of all stores in the
> country)
> > > >button on the home page and get the same results?
> > > >
> > > >If people wanted to do creative PR marketing, they could create
> > > >
> > > >my-makudonarudo.co.jp  (with "makudonarudo" in Katakana, if you like)
> > > >
> > > >and have the custom site *remember* where you live and which restaurant
> > > >you like to go to nearby.
> > > >
> > > >Maybe movie names or band names, and stuff like that -- sure. Marketers
> > > >love that stuff.
> > > >
> > > > > domain
> > > > > may  raise interests and easier to remember and that was not
> possible
> > > > > with LDH domains  for CJK peoples.
> > > >
> > > >Yes, but use of excessively long natural phrase domains like this is
> just
> > > >misuse of what domain names should be for, in my opinion, and will be
> > > >quickly outcompeted by more effective ways to get people to visit your
> > > >sites and get the information or whatever else they are looking for.
> > > >
> > > >--Ken
> > > >
> > > > >
> > > > > .kr .cn .jp domain suffices may be omitted in future localized web
> > > > browsers, and
> > > > > it will make japanese and chinese sentence domains good for PR
> marketing.
> > > > >
> > > > > I think it is safe for us to leave much rooms for these creative
> naming
> > > > > conventions
> > > > > in our ACE proposals for the future.
> > >
> > >
> > >
> >
> 
>