[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

psamp vocabulary

To: psamp@ops.ietf.org
Subject: psamp vocabulary
From: Rae McLellan <rae@research.bell-labs.com>
Date: Thu, 12 Sep 2002 16:05:39 -0400 (EDT)

> having only deterministic selectors is an intriguing idea, but I have 
> two misgivings:
>
> 1) we need to allow the simplest implementations in order for PSAMP
> to be ubiquitous. Decrementing a counter is very simple: if we exclude
> it then some devices might find it difficult to do PSAMP sampling.

Are you suggesting psamp define levels of conformance?

> 2) if all selection operations are deterministic on packet content, it
> would be easier to construct packets to evade selection (although having
> a strong hash function with an obscure selection criterion makes this
> more difficult). Or even without malice, with a weak hash function you 
> might have an unlucky traffic mix where you entirely miss a large 
> bunch of traffic. Having the option of random selection guards against 
> this.

ok, here's an example where hashing doesn't provide the functionality
of true random sampling.  Thanks.

> I have some comments on the hash functions that you mentioned in a
> previous message: 
>>    - IP ID & <mask> == <value>
>>    - IP Checksum & <mask> == <value>
>>    - Checksum(IP header w/o TTL) & <mask> == <value>
>>    (note that these last 3 can be used to generate an almost uniform
>>     sample of the IP packets, yet they're still based on IP header)
>
> Have you done any experiments on the statistical quality of these as
> hash functions for packet selection? As hash functions go these are very
> weak. Having good statistical properties of selection would rely on
> having a tame distribution of the field contents of the packets.
> (This can't be relied upon: we looked at traces, and there are gotchas
> there for the ID field in particular due, it seems, to bad
> implementations). I'm concerned that they would be easy to evade.

Yes, I've looked at the distribution of IP ID field values.  And except
for the ID values equal to 0 or 1, they are very evenly distributed.
The anomalous behavior of ID=0 and ID=1 appears to come from ICMP
router chat.  Apparently some router's are clearing the the ID field
for each ICMP message instead of maintaining and using an IP packet
counter for each interface.  

And yes, a malicious packet generator could manage to avoid selection
by any hashing algorithm based on IP header contents alone.
Perhaps obscurity of hashing function isn't enough to avoid this.

> And even with a tame distribution of packet fields, having a uniform 
> selection distribution is not the only desirable property. We also
> want small correlations between selection decisions of successive 
> packets, including selection of packets from the same IP level flow
> (i.e. packets with same IP src/dst address). The input of these hashes
> doesn't change much from packet to packet of the flow, and the hash
> function is weak, so there will be a lot of correlation.
>
> A strong hash function should have the property, roughly speaking, that 
> flipping a bit of the input gives a big change in the hash function.
> This gives the statistical properties of selection some robustness
> against correlations in the packet contents. The IP checksum does not
> have this property. IP ID increments, so there's not much variation in 
> it.

I suggested the CRC as a simple hash function that was already computed,
(I'm lazy).  I'd appreciate examples of hard hash functions on the IP
header contents.
			Rae McLellan


--
to unsubscribe send a message to psamp-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/psamp/>

Follow-Ups:
- Re: psamp vocabulary
  - From: Nick Duffield <duffield@research.att.com>

Prev by Date: psamp vocabulary
Next by Date: psamp vocabulary
Previous by thread: psamp vocabulary
Next by thread: Re: psamp vocabulary
Index(es):
- Date
- Thread