[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: draft on sampling techniques



Cristian,

comments inline below:

Cristian Estan wrote:
> 
> Hi Tanja,
> 
> A couple of observations about the draft:
> 
> 1) For content based sampling, you say that we can base the decision on
> field values or hash functions. Do we really want all devices in the
> network to support functionality as "sample all packets coming from IP
> address X"? Is this useful? Is this dangerous? Is this prone to
> misconfigurations that would result in too many packets being selected?
> We *must* specify what the hash functions are that we base the sampling
> decision on and what fields they take as input and also in section 4.2
> how the network operator can "seed" them so that the system cannot be
> manipulated/evaded by "the bad guys". This is because devices from
> different vendors have to be consistent. I would incline towards not
> making the fields the hash is computed on configurable (i.e. hard code
> in the standard what (invariant) fields we hash on). Other opinions?

There are three parts to hash-based packet sampling:
1) choice of input fields
2) choice of hash function
3) choice of selection range (as in: packet is selected if its hash
falls
in a given range)

The question of whether bad guys could evade the hash-function came up
for discussion at IETF 53. My suggestion there was that operators could
keep the range private, even if the choice of input fields and hash is
public. Since the size of the range determines the selection
probability,
one could e.g. have the selection range as an interval that the operator
could set privately within the possible range of the hash function.

Reconfiguring the choice of input field could be a bad idea for the 
following reason. For hash based sampling to have good statistical 
properties, the input has to have enough "entropy" that packet 
selection appears close to independent of any particular subportion 
of the input. An arbitrary change to an input where the entropy is not
well understood could lead to bad statistical properties.

Another possibility is tuning the hash function itself with a private 
knob. Some hash functions lend themselves to this (e.g. those based on 
prime division, although these may not be the easiest to implement); but 
other may have no such knobs.

> 
> 2) Stratified sampling looks like it can get complex (although I never
> designed a switch or router), but it's useful. 

What applications do you envisage?

>[...]

Nick

--
to unsubscribe send a message to psamp-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/psamp/>