[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

heads up: IDNA prefix assignment



This is what I've started discussions with the IANA about.
Once I've talked to Michelle and ironed things out I'll put
this in front of the IESG.

  Erik

Date: Fri, 17 Jan 2003 20:41:35 +0100 (CET)
From: "Erik Nordmark" <Erik.Nordmark@sun.com>
Subject: draft IDNA prefix selection writeup
To: iana@iana.org
Cc: erik.nordmark@sun.com, harald@alvestrand.no


Michelle,

Do you have any issues with selecting the IDNA prefix using this
type of procedure?
I can help with getting the piece of software from RFC 2777 running for you.

   Erik


DRAFT

Subject: selection of IDNA prefix

As specified in draft-ietf-idn-idna-14.txt the
IANA will assign the ACE prefix in consultation with the IESG.

This note specifies how this selection will be performed.

There has been concerns in the community that the selection of
this ACE prefix be done in random fashion way where nobody can influence the
selection. The closest "running code" we have for such a scheme is
the publicly verifiably random selection which is used for nomcom (RFC 2777).
Of course, there are also folks in the community that think we should just
pick something obvious like "xx--" and be done with it.
But to err on the side of being careful, and perhaps too careful, the IANA
will use RFC 2777 to pick a single prefix. (Note that RFC 2777 is normally
used to pick multiple candidates from a list - here we are picking
only one).

The first step in this process consists of selecting the set of
<letter,letter>-- prefixes to use excluding those where there is evidence that
they are already used or might cause confusion. Paul Hoffman volunteered such
a list  to the IESG back in November (see below) proposing either a list of 585
prefixes or 380 prefixes. The IANA and the IESG has chosen to use the list
with 380 candidate prefixes from ab-- to zy--. 

The random selection will be done (just like the nomcom selection)
using as source of randomness the number of shares traded
of 12 stocks selected by the IANA.

The official shares traded numbers (denoted in 000s) will be drawn from
the January XXX, 2003 Wall Street Journal which reports the sales
figures from the previous trading day - January XXX, 2003. If trading
in any of the stocks  is suspended, then the shares traded will be
assumed to be 0.

STOCKS USED IN THE SELECTION PROCESS:

XXX TBD. Might use ticker symbols all starting with "I" for cuteness.
XXX Or pick from http://finance.yahoo.com/mnvl?e=nq and relates lists
to get shares with large volume = translates to good randomness.

The single random produced by the algorithm and code in RFC 2777
will select the prefix being used from the list of candidates in
alphabetical order i.e.
1. ab
2. ah
3. aj
...
380. zy

The selection will be announced soon after January XXX.

   Erik Nordmark for the IESG
   Michelle Cotton for the IANA


----


Date: Sun, 24 Nov 2002 18:47:37 -0800
From: "Paul Hoffman / IMC" <phoffman@imc.org>
Subject: Choosing a prefix

Here is my data as of November 24. All data was collected after the IESG
created the standard and should be considered usable for our final
determination of the IDN prefix.

In the following discussion, I only count prefixes that were found using
the regexp /[a-z][a-z]--/. That is, I ignore prefixes with digits or a
dash in either position.


1. Data gotten from AXFR

I attempted AXFR transfers on the following list of TLDs.

ac ad ae aero af ag ai al am an ao aq ar arpa as at au aw az ba bb bd be
bf bg bh bi biz bj bm bn bo br bs bt bv bw by bz ca cc cd cf cg ch ci ck
cl cm cn co com coop cr cu cv cx cy cz de dj dk dm do dz ec edu ee eg er
es et fi fj fk fm fo fr ga gb gd ge gf gg gh gi gl gm gn gov gp gq gr gs
gt gu gw gy hk hm hn hr ht hu id ie il im in info int io iq ir is it je
jm jo jp ke kg kh ki km kn kr kw ky kz la lb lc li lk lr ls lt lu lv ly
ma mc md mg mh mil mk ml mm mn mo mp mq mr ms mt mu museum mv mw mx my
mz na name nc ne net nf ng ni nl no np nr nu nz om org pa pe pf pg ph pk
pl pm pn pr ps pt pw py qa re ro ru rw sa sb sc sd se sg sh si sj sk sl
sm sn so sr st su sv sy sz tc td tf tg th tj tk tm tn to tp tr tt tv tw
tz ua ug uk um us uy uz va vc ve vg vi vn vu wf ws ye yt yu za zm zw

I got successful responses to my AXFR from more than half of them; these
zones were:

ad al am an ao ar as au az ba bd bg bi bj bm bn bo bs bt bv bw by ci ck
cl cm cr cu cv cy cz dz ec ee eg er es fi fj fm ga gb gd ge gf gg gh gi
gl gm gn gp gq gs gt gu gy hn hr hu ie il im in it je jm jo ke kg kh ki
km kn kz lb lc lk lr ly ma mc md mg mh mk ml mm mn mq mr ms mt mw mx my
mz nc ne ng ni np om pa pe pg pk pr ps pw py qa ro ru sa se sg si sj sk
sl sm sn so st su sv sz tc tf th tj tm tn to tp tr tt tv tz ua ug uk um
uy ve vg vi vu ye yu za zm zw

If any of the TLDs had a SLD of one of following, I tried to do an AXFR
on the SLD as well:

com net org edu co ne ad ed gov

There were 95 such SLDs.

I then searched all the records from the AXFRs for possible prefix
strings. The only prefixes in all of these that I found were:

bq ra zq


2. Data from gTLDs

I collected data from the big gTLDs for which I had access to current
zone files:

arpa com edu gov inaddr net org

There were 82 prefixes used in those zones:

aa ac ad at az ba bq bu cc co ct dc dj dq dr dz ea el ex fi ge go he id
in io it jp ku kz la ll lz ma mo my mz na no ny ok on pc ph pu qm qn qo
qp qq qr qs qt qu qv qw qx qy qz ra rc re se so st sz ta tv us uv vr we
wm wz xa xz ya yu yz za zq zz


3. Data acquired by personal requests

I sent messages to various TLD administrators and other people asking
for them to run a simple Perl program that would report the prefixes
used. Notably, I sent this message to CENTR and JET. CENTR and JET
combined responses from the zones that reported to them. The zones for
which I got replies are:

ac at bv bv ch cl cz dk dk es fr fr gr hr hu ie il io ir jp kr kr li lt
lu nl no no pl pt se sh si si sj sj tw uk us

The prefixes used in those zones are:

bq df la my no ok om on pc ra rs to uk yu


4. Prefixes prohibited by draft-ietf-idn-idna-14.txt

The following prefixes are prohibited in section 5 of the IDNA standard:

bl bq dq lq mq ra wq zq


5. Combined lists

The combination of the three lists of used prefixes, after picking out
duplicates, is 91 prefixes:

aa ac ad at az ba bl bq bu cc co ct dc df dj dq dr dz ea el ex fi ge go
he id in io it jp ku kz la ll lq lz ma mo mq my mz na no ny ok om on pc
ph pu qm qn qo qp qq qr qs qt qu qv qw qx qy qz ra rc re rs se so st sz
ta to tv uk us uv vr we wm wq wz xa xz ya yu yz za zq zz

The leaves the following 585 as available:

ab ae af ag ah ai aj ak al am an ao ap aq ar as au av aw ax ay bb bc bd
be bf bg bh bi bj bk bm bn bo bp br bs bt bv bw bx by bz ca cb cd ce cf
cg ch ci cj ck cl cm cn cp cq cr cs cu cv cw cx cy cz da db dd de dg dh
di dk dl dm dn do dp ds dt du dv dw dx dy eb ec ed ee ef eg eh ei ej ek
em en eo ep eq er es et eu ev ew ey ez fa fb fc fd fe ff fg fh fj fk fl
fm fn fo fp fq fr fs ft fu fv fw fx fy fz ga gb gc gd gf gg gh gi gj gk
gl gm gn gp gq gr gs gt gu gv gw gx gy gz ha hb hc hd hf hg hh hi hj hk
hl hm hn ho hp hq hr hs ht hu hv hw hx hy hz ia ib ic ie if ig ih ii ij
ik il im ip iq ir is iu iv iw ix iy iz ja jb jc jd je jf jg jh ji jj jk
jl jm jn jo jq jr js jt ju jv jw jx jy jz ka kb kc kd ke kf kg kh ki kj
kk kl km kn ko kp kq kr ks kt kv kw kx ky lb lc ld le lf lg lh li lj lk
lm ln lo lp lr ls lt lu lv lw lx ly mb mc md me mf mg mh mi mj mk ml mm
mn mp mr ms mt mu mv mw mx nb nc nd ne nf ng nh ni nj nk nl nm nn np nq
nr ns nt nu nv nw nx nz oa ob oc od oe of og oh oi oj ol oo op oq or os
ot ou ov ow ox oy oz pa pb pd pe pf pg pi pj pk pl pm pn po pp pq pr ps
pt pv pw px py pz qa qb qc qd qe qf qg qh qi qj qk ql rb rd rf rg rh ri
rj rk rl rm rn ro rp rq rr rt ru rv rw rx ry rz sa sb sc sd sf sg sh si
sj sk sl sm sn sp sq sr ss su sv sw sx sy tb tc td te tf tg th ti tj tk
tl tm tn tp tq tr ts tt tu tw tx ty tz ua ub uc ud ue uf ug uh ui uj ul
um un uo up uq ur ut uu uw ux uy uz va vb vc vd ve vf vg vh vi vj vk vl
vm vn vo vp vq vs vt vu vv vw vx vy vz wa wb wc wd wf wg wh wi wj wk wl
wn wo wp wr ws wt wu wv ww wx wy xb xc xd xe xf xg xh xi xj xk xl xm xn
xo xp xq xr xs xt xu xv xw xx xy yb yc yd ye yf yg yh yi yj yk yl ym yn
yo yp yq yr ys yt yv yw yx yy zb zc zd ze zf zg zh zi zj zk zl zm zn zo
zp zr zs zt zu zv zw zx zy

However, Kim Davies of CENTR made an additional suggestion that seems
reasonable (but not required). She suggested that we disallow any of the
current two-letter ccTLD abbreviations from our list. If we agree with
that, the resulting list still has 380 choices:

ab ah aj ak ap av ax ay bc bk bp bx cb ce cj cp cq cs cw da db dd dg dh
di dl dn dp ds dt du dv dw dx dy eb ed ef eh ei ej ek em en eo ep eq eu
ev ew ey ez fa fb fc fd fe ff fg fh fl fn fp fq fs ft fu fv fw fx fy fz
gc gj gk gv gx gz ha hb hc hd hf hg hh hi hj hl ho hp hq hs hv hw hx hy
hz ia ib ic if ig ih ii ij ik ip iu iv iw ix iy iz ja jb jc jd jf jg jh
ji jj jk jl jn jq jr js jt ju jv jw jx jy jz ka kb kc kd kf kj kk kl ko
kp kq ks kt kv kx ld le lf lg lh lj lm ln lo lp lw lx mb me mf mi mj nb
nd nh nj nk nm nn nq ns nt nv nw nx oa ob oc od oe of og oh oi oj ol oo
op oq or os ot ou ov ow ox oy oz pb pd pi pj po pp pq pv px pz qb qc qd
qe qf qg qh qi qj qk ql rb rd rf rg rh ri rj rk rl rm rn rp rq rr rt rv
rx ry rz sf sp sq ss sw sx tb te ti tl tq ts tu tx ty ub uc ud ue uf uh
ui uj ul un uo up uq ur ut uu uw ux vb vd vf vh vj vk vl vm vo vp vq vs
vt vv vw vx vy vz wa wb wc wd wg wh wi wj wk wl wn wo wp wr wt wu wv ww
wx wy xb xc xd xe xf xg xh xi xj xk xl xm xn xo xp xq xr xs xt xu xv xw
xx xy yb yc yd yf yg yh yi yj yk yl ym yn yo yp yq yr ys yv yw yx yy zb
zc zd ze zf zg zh zi zj zk zl zn zo zp zr zs zt zu zv zx zy

I believe that we can safely use this list of 585 or the list of 380
equally well.

Let me know what y'all think.

--Paul Hoffman, Director
--Internet Mail Consortium


>----- End Included Message -----<
)