Invariance frees psamp from specifying the order and allows different
vendors to implement the selectors in different way w/o affecting the
results.
that's exactly the point: it seems that "the results" for you means
"the resulting selected sample". With my comment above, I was saying
that "the results" should be on the contrary
"1) the size of the selected sample" and
2) the results you derive from the analysis of the selected sample".
I wish the results as you define them were all that mattered. But in
a world of multi-vendor interoperability. It so much easier to verify
that the report streams coming from two different vendor's boxes are
identical between different vender's boxes than the final analysis
being similar. Indeed, vendor differentiation may well be in the
report analysis.
How realistic is it that two separate devices, see the exact traffic, and
had their psamp module initialized at exactly the same time? It seems to
me that this can happen only if they are on the same link, or if the link
is lossy, on the same end of the same link (and at least one of them has
to be passive then).
Note that it's not only a terminology issue. If we require that varying
the selector ordering (which is something that we may desire to ease
implementations) we get the same selected sample, then we have to
exclude the whole "third group" of samplers you outlined in your
previous e-mail, i.e. random samplers and samplers based on packet
position. To this last category belongs e.g. the simple 1 out of N
sampler implemented by decrementing a counter, which is the simplest
we can think of. Do we want to exclude it?
I realize it's a radical approach. But, *if* the functionality (by your
definition of results) of the "third group" of random selectors can be
provided by deterministic hash functions on the packet header...
then yes, I'm suggesting the psamp standard exclude the "third group"
Trading generality for debugability seems like a good tradeoff to me.