[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] call for comments for REORDERING



Ken is right; collation is *quite* complex. Anyone wanting to see some of
what is involved can look at the ICU implementation (which is UCA and ISO
14651 compliant, and open-source):

User Guide:
http://www-124.ibm.com/icu/userguide/Collate_Intro.html

Internal Design:
http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/collation/ICU_
collation_design.htm
(source files are linked from there)

ICU home:
http://oss.software.ibm.com/icu/

While the perfomance is good, the code is many, many orders of magnitude
more complicated than the current nameprep. It is not appropriate for IDN.

Mark

—————

Δός μοι ποῦ στῶ, καὶ κινῶ τὴν γῆν — Ἀρχιμήδης
[http://www.macchiato.com]

----- Original Message -----
From: "Kenneth Whistler" <kenw@sybase.com>
To: <jseng@pobox.org.sg>
Cc: <idn@ops.ietf.org>
Sent: Thursday, October 18, 2001 8:28 PM
Subject: Re: [idn] call for comments for REORDERING


> James Seng said:
>
> > Third, I would really prefer to reference a work from established expert
> > group if possible. For example, ISO/IEC JTC1/SC22/WG20 publishes ISO
> > 14651 on weighted sorting. I am not sure how ISO 14651 would perform for
> > the IDN purpose but I thought it might be worthwhile to examine.
>
> As one of the principal authors of ISO 14651, who has also implemented
> the synchronized Unicode Technical Standard #10, the Unicode Collation
> Algorithm, I can attest that this is a very tricky and complicated
> area, and the algorithms to do all this correctly are not the kind
> you can write on the back of a cocktail napkin. It is very complex
> to get all the details right and to get good-performing algorithms
> (in speed and in resource usage). It is also very difficult for
> independent implementations to get themselves all exactly lined
> up, and even more difficult for independent implementations to *prove*
> that they are getting the same results for all data (as opposed to
> a particular result for one set of data -- which is pretty easy).
>
> IDN doesn't need to add this kind of headache to the already
> complex enough issues of nameprep.
>
> >
> > Whatever the case, we should make a decision quickly on this. Lets not
> > drag this further if possible.
> >
> > -James Seng
> >
> > ----- Original Message -----
> > From: "Martin Duerst" <duerst@w3.org>
>
> > > So this is a solution in search of a real problem,
> > > not worth bothering the whole world with additional
> > > complexity.
>
> I heartily concur with Martin's assessment.
>
> --Ken Whistler
>
>
>