[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: psamp vocabulary



At 05:50 PM 9/10/2002 -0400, Nick Duffield wrote:
>Rae,

I agree with Nick about keeping PSAMP simple.
At a minimum, PSAMP devices (via the MIB?) should advertise
their capabilities in detail, so a collector application can
decide if result comparisons between 2 samplers are meaningful.

Beyond that, PSAMP can define specific packet selection mechanisms,
that (hopefully) vendors will adopt and implement.  There needs
to be some leeway here to allow for vendor differentiation,
variation in platform capabilities and intended use, and future extensions.  

The same leeway is not desirable for other parts of the standard,
such as congestion control and report format.

Andy



>Rae McLellan wrote:
>> 
>> >> Invariance frees psamp from specifying the order and allows different
>> >> vendors to implement the selectors in different way w/o affecting the
>> >> results.
>> >
>> > that's exactly the point: it seems that "the results" for you means
>> > "the resulting selected sample".  With my comment above, I was saying
>> > that "the results" should be on the contrary
>> > "1) the size of the selected sample" and
>> >  2) the results you derive from the analysis of the selected sample".
>> 
>> I wish the results as you define them were all that mattered.  But in
>> a world of multi-vendor interoperability.  It so much easier to verify
>> that the report streams coming from two different vendor's boxes are
>> identical between different vender's boxes than the final analysis
>> being similar.  Indeed, vendor differentiation may well be in the
>> report analysis.
>> 
>> > Note that it's not only a terminology issue. If we require that varying
>> > the selector ordering (which is something that we may desire to ease
>> > implementations) we get the same selected sample, then we have to
>> > exclude the whole "third group" of samplers you outlined in your
>> > previous e-mail, i.e. random samplers and samplers based on packet
>> > position. To this last category belongs e.g. the simple 1 out of N
>> > sampler implemented by decrementing a counter, which is the simplest
>> > we can think of. Do we want to exclude it?
>> 
>> I realize it's a radical approach.  But, *if* the functionality (by your
>> definition of results) of the "third group" of random selectors can be
>> provided by deterministic hash functions on the packet header...
>> then yes, I'm suggesting the psamp standard exclude the "third group"
>> of random selectors.  I'm transferring effort in the standardization
>> process from specifying the order of selectors and worrying about
>> their grouping syntax/semantics, to a few short paragraphs explaining
>> how the deterministic hash functions can provide similar results.
>> 
>
>having only deterministic selectors is an intriguing idea, but I have 
>two misgivings:
>
>1) we need to allow the simplest implementations in order for PSAMP
>to be ubiquitous. Decrementing a counter is very simple: if we exclude
>it then some devices might find it difficult to do PSAMP sampling.
>
>2) if all selection operations are deterministic on packet content, it
>would be easier to construct packets to evade selection (although having
>a strong hash function with an obscure selection criterion makes this
>more difficult). Or even without malice, with a weak hash function you 
>might have an unlucky traffic mix where you entirely miss a large 
>bunch of traffic. Having the option of random selection guards against 
>this.
>
>I have some comments on the hash functions that you mentioned in a
>previous
>message: 
>
>>    - IP ID & <mask> == <value>
>>    - IP Checksum & <mask> == <value>
>>    - Checksum(IP header w/o TTL) & <mask> == <value>
>>    (note that these last 3 can be used to generate an almost uniform
>>     sample of the IP packets, yet they're still based on IP header)
>
>Have you done any experiments on the statistical quality of these as
>hash functions for packet selection? As hash functions go these are very
>weak. Having good statistical properties of selection would rely on
>having 
>a tame distribution of the field contents of the packets. (This can't be
>relied upon: we looked at traces, and there are gotchas there for the ID 
>field in particular due, it seems, to bad implementations). I'm
>concerned 
>that they would be easy to evade.
>
>And even with a tame distribution of packet fields, having a uniform 
>selection distribution is not the only desirable property. We also
>want small correlations between selection decisions of successive 
>packets, including selection of packets from the same IP level flow
>(i.e.
>packets with same IP src/dst address). The input of these hashes doesn't 
>change much from packet to packet of the flow, and the hash function is 
>weak, so there will be a lot of correlation.
>
>A strong hash function should have the property, roughly speaking, that 
>flipping a bit of the input gives a big change in the hash function.
>This gives the statistical properties of selection some robustness
>against correlations in the packet contents. The IP checksum does not
>have this property. IP ID increments, so there's not much variation in 
>it.
>
>Nick
>
>> Is this possible?  I dunno.  I was just pointing out that this might
>> be a path for psamp to persue.  Is there some type of sampling
>> results (your definition) that this approach is precluding?
>> or perhaps those few short paragraphs I mentioned aren't possible?
>> 
>>                                 Rae McLellan
>> 
>> --
>> to unsubscribe send a message to psamp-request@ops.ietf.org with
>> the word 'unsubscribe' in a single line as the message text body.
>> archive: <http://ops.ietf.org/lists/psamp/>
>
>--
>to unsubscribe send a message to psamp-request@ops.ietf.org with
>the word 'unsubscribe' in a single line as the message text body.
>archive: <http://ops.ietf.org/lists/psamp/> 


--
to unsubscribe send a message to psamp-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/psamp/>