[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] call for comments for REORDERING




----- Original Message ----- 
From: "Martin Duerst" <duerst@w3.org>
To: "Soobok Lee" <lsb@postel.co.kr>; "James Seng/Personal" <jseng@pobox.org.sg>; <idn@ops.ietf.org>
Sent: Friday, October 19, 2001 6:15 PM
Subject: Re: [idn] call for comments for REORDERING


> At 14:21 01/10/19 +0900, Soobok Lee wrote:
> 
> >----- Original Message -----
> >From: "James Seng/Personal" <jseng@pobox.org.sg>
> >To: "Soobok Lee" <lsb@postel.co.kr>; <idn@ops.ietf.org>
> >Sent: Friday, October 19, 2001 1:17 PM
> >Subject: Re: [idn] call for comments for REORDERING
> >
> >
> > > You dont get my point.
> > >
> > > Reordering achieve shorter by putting more oftenly used characters in
> > > one block and others at the back. BUT this also means less oftenly used
> > > characters would result in a *LONGER* label then usual.
> >
> >No. Even For GROUPS of those rare cases, we get Always SHORTER labels than 
> >usual.
> 
> Sorry, but this is nonsense. You cannot compress all labels
> by reordering. It just doesn't work. The argument below is
> about as foolproof as a perpetuum mobile design.

Never non-sense. You have misunderstandings.

See the word "GROUPS" in the above statement. 
I didn't assert them on  "every individual labels".

> 
> In reordering, you will always bring some codepoints closer
> together (and names from these will then compress better)
> at the expense of moving some farther appart (and thus
> reducing compression).

That's true for rare individual cases, but not in overall results, because
The expense in latter part is much less than the gain in former part.
The net gain is always positive for even rare groups of characters.


> 
> You just have to think about a mixed label, with some
> frequent and some not so frequent characters.

Sure. REORDERING also works for these kinds of lables.

Frequent charc: f1 f2 f3 f4 ...
Rare  chars   : r1 r2 r3 r4
We have a mixed reordered label:  f1 r1 f2 r2 f3 r4 . com
AMC-Z sort it into :  f1 f2 f3 f4 r1 r2 r3 r4 . com
AMC-Z encode them  :  label <f1 f2 f3 f4>  + junction overhead <f4 r1>+ label <r1 r2 r3 r4>

You know reordering works well for label <f1 f2 f3 f4 > and label <r1 r2 r3 r4>.

Soobok lee


> 
> Regards,   Martin.
> 
> 
> 
> >For example,
> >if REORDERING uses a frequent characters set of length 4096 in entire 
> >20992-letter
> >han ideographic script block, rarely used han letter will be put
> >into the subblock of length 20992-4096 = 16896 which is 20% shorter than 
> >20992.
> >
> >let's assume we have with a label consisting of only those characters from 
> >16896 block. without reordering, they are randomly distributed in 20992 
> >letter block.
> >with reordering, they are in shorter 16896-letter block.
> >In the latter case, we got shorter successive code distances and shorter
> >ACE labels than in the former one.
> >
> >  Even without REORDERING, ACE has favored some of them, and disfavored
> >others. But, with REORDERING, BOTH groups get shorter ACE labels.
> >That is the virtue of REORDERING.
> >
> >Soobok
> >
> > > And who are we
> > > to say these less oftenly used characters are less important or worst,
> > > become invalid (too long to fit) because of this reordering?
> > >
> > > -James Seng
> > >
> >
> >
> >
> >
>