[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] overall compression efficiciency matters



Please don't worry about my strong-worded subject  "...derailed".

DUDE , in a  case, supports up to 56 han/hangeul letters
in a label (a letter repeated 56 times). if anyone ever registers it,
the median average becomes 28. Let's ignore it.

Both mean (weighted) average and mode (most frequent value) 
average are not relevant to our works, i think.  

What I mean by "average domains", is  that we should
consider overall compression efficiency for samples of 
various lengths of domains that are likely to be registered, 
not only for *rare* marginal cases of very long domains which
can mislead us to lose our right focus. That's the point.
 
And many useful DUDE-alternatives are not expensive.
Lengthy tables do not add complexity, but only require 
time for careful evaluation and preparations as that for 
nameprep and kc norm.

Soobok


----- Original Message ----- 
From: "Paul Hoffman / IMC" <phoffman@imc.org>
To: <idn@ops.ietf.org>
Sent: Tuesday, July 10, 2001 12:51 PM
Subject: Re: [idn] THis WG derailed ?


> At 9:35 AM +0900 7/10/01, Soobok Lee wrote:
> >?This WG  should focus on issues of AVERAGE length of ACE lables.
> 
> That is an opinion. There has been a different opinion of many other 
> people that the WG should focus on getting an algorithm good enough 
> for the vast majority of likely names. If you think the discussion 
> needs to be opened again, you should speak with the WG chairs. 
> However, you need to be much more specific than saying "AVERAGE".
> 
> There are three kinds of averages (mean, median, and mode); of those, 
> clearly we do not want to consider mode. How do you intend to measure 
> either mean or median? All of the ACEs proposed so far have different 
> means and medians for different scripts. They also have different 
> means and medians for names that contain characters from only one 
> script versus names that contain characters from two (such as names 
> that are based on a non-Latin script but use Latin hyphens or digits).
>