[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] complexity/simplicity: NAMEPREP code vs ACE codes
For whom had never looked into NAMEPREP codes in MDNkit of JPNIC,
[root@bora lib]# pwd
/home/lsb/mdn/mdnkit-2.0.1-src/lib
[root@bora lib]# wc name*[hc] uni*[hc]
296 1109 8554 nameprep.c
136 804 5475 nameprep_template.c
1694 11778 73804 nameprepdata.c
484 1822 12314 unicode.c
6806 38573 327222 unicodedata.c
9416 54086 427369 total
[root@bora lib]# ls -l *ace*c dud*c
-rw-r--r-- 1 3364 wheel 7968 Mar 7 09:58 ace.c
-rw-r--r-- 1 3364 wheel 19370 Apr 4 15:11 amcacem.c
-rw-r--r-- 1 3364 wheel 11963 Mar 30 11:35 amcacer.c
-rw-r--r-- 1 3364 wheel 15803 Mar 7 18:34 brace.c
-rw-r--r-- 1 3364 wheel 7892 Mar 26 10:45 dude.c
-rw-r--r-- 1 3364 wheel 11671 Mar 28 15:37 lace.c
-rw-r--r-- 1 3364 wheel 11826 Mar 7 18:34 race.c
[root@bora lib]# wc *ace*c dud*c
291 1186 7968 ace.c
829 3024 19370 amcacem.c
496 1751 11963 amcacer.c
666 2331 15803 brace.c
448 1837 11671 lace.c
462 1842 11826 race.c
313 1224 7892 dude.c
3505 13195 86493 total
----- Original Message -----
From: "Mark Davis" <mark@macchiato.com>
To: "Soobok Lee" <lsb@postel.co.kr>; "Kenneth Whistler" <kenw@sybase.com>; "Marc Blanchet" <Marc.Blanchet@viagenie.qc.ca>
Cc: <idn@ops.ietf.org>
Sent: Thursday, June 28, 2001 7:21 AM
Subject: Re: [idn] Report from the ACE design team
> Normalization does take some data files, but it does not take thousands of
> lines of code. See
> http://www.macchiato.com/unicode/normalization_footprint.htm for more
> information.
>
> Mark
>
> ----- Original Message -----
> From: "Soobok Lee" <lsb@postel.co.kr>
> To: "Kenneth Whistler" <kenw@sybase.com>; "Marc Blanchet"
> <Marc.Blanchet@viagenie.qc.ca>
> Cc: <idn@ops.ietf.org>
> Sent: Wednesday, June 27, 2001 14:22
> Subject: Re: [idn] Report from the ACE design team
>
>
> > In IDNA, all domain labels are NAMEPREPed and then ACEed.
> >
> > If any one would ever look into the NAMEPREP implementations,
> > he/she could not understand why we adhere to simple ACEs.
> >
> > NAMEPREP already has HUGE unicode mapping tables and
> > long lists of prohibited characters. It has KC normalization
> > code in it. Thousands lines of code for all.
> >
> > Moreover, NAMEPREP and ACE may be coded into
> > a single IDNA library by application vendors.
> > Choosing the simplest ACE (and saving only one hundred
> > lines of code) does not greatly simplify IDNA implementations and
> > so lose its justification
> > as long as we have still complex NAMEPREP process paired with ACE.
> >
> > Soobok Lee, lsb@postel.co.kr
> >
> >
> > ----- Original Message -----
> > From: "Marc Blanchet" <Marc.Blanchet@viagenie.qc.ca>
> > To: "Kenneth Whistler" <kenw@sybase.com>
> > Cc: <idn@ops.ietf.org>
> > Sent: Wednesday, June 27, 2001 11:20 PM
> > Subject: Re: [idn] Report from the ACE design team
> >
> >
> > > At/?20:18 2001-06-26 -0700, Kenneth Whistler you wrote/vous ?riviez:
> > > >Soobok Lee wrote:
> > > >
> > > > > I believe long natural sentence/phrase domains escape your
> arguments.
> > > > >
> > > > > In CJK, "What is the nearest macdonald hamberger shop from
> > > > here?".(kr|cn|jp|tw)
> > > >
> > > >??
> > > >
> > > >Why would someone want to type in:
> > > >
> > >
> >"konomawariniwaichibanchikaimakudonarudohambaaganomisewadokodeshooka.co.jp"
> > >
> > >
> > > Sorry Ken, but your argument of long-names-will-not-be-used does not
> stand.
> > > In the .com zone file, there is _already_ some labels which are 63
> > > characters long. So even in english, there is a use (that one can
> > > question, but...) of all space available. The idn tax that give us ~ 20
> > > chars max for some scripts is already a big tax compared to what is
> > > available and _used_ in all ascii labels. To me, the best compression
> we
> > > can get is the best, (without necessarily forgetting the complexity).
> > >
> > > Marc.
> > >
> > >
> > >
> > >
> > > >when they could go to
> > > >
> > > >mcdonalds.co.jp
> > > >
> > > >click on the "zenkoku no mise listo" (List of all stores in the
> country)
> > > >button on the home page and get the same results?
> > > >
> > > >If people wanted to do creative PR marketing, they could create
> > > >
> > > >my-makudonarudo.co.jp (with "makudonarudo" in Katakana, if you like)
> > > >
> > > >and have the custom site *remember* where you live and which restaurant
> > > >you like to go to nearby.
> > > >
> > > >Maybe movie names or band names, and stuff like that -- sure. Marketers
> > > >love that stuff.
> > > >
> > > > > domain
> > > > > may raise interests and easier to remember and that was not
> possible
> > > > > with LDH domains for CJK peoples.
> > > >
> > > >Yes, but use of excessively long natural phrase domains like this is
> just
> > > >misuse of what domain names should be for, in my opinion, and will be
> > > >quickly outcompeted by more effective ways to get people to visit your
> > > >sites and get the information or whatever else they are looking for.
> > > >
> > > >--Ken
> > > >
> > > > >
> > > > > .kr .cn .jp domain suffices may be omitted in future localized web
> > > > browsers, and
> > > > > it will make japanese and chinese sentence domains good for PR
> marketing.
> > > > >
> > > > > I think it is safe for us to leave much rooms for these creative
> naming
> > > > > conventions
> > > > > in our ACE proposals for the future.
> > >
> > >
> > >
> >
>
>