[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

psamp vocabulary



> If the terminology isn't clear here, do we need to come up with
> something better? With the words currently at my disposal,
> my usage is:
>
>   1. sampling = 1 in N (periodic or statistical) or hash-based
>   2. filtering = filtering
>   3. (primitive) selectors = either 1 or 2, and further methods TBD
>   4. (composite) selectors = composites of methods from 3
>
> So the work of item 1: 
>
>    "1. Selectors for packet sampling. Define the set of primitive 
>    packet selection operations for network elements, the parameters
>    by which they may be configured, and the ways in which they can
>    be combined."
>
> is precisely to lay out what these selectors are. 

I believe the WG charter's use of "Selectors" and Andy's subsequent
use of the term (in defining his suggested subdivision of tasks 1&2)
is strictly generic.  And from the following psamp posts, it looks
like we can classify the types of selectors into 3 basic groups.

1) selectors that operate directly on the packet header. i.e. some
   function applied to the bits of an IP header that returns a
   boolean select/no-select value.  e.g.:
   - IP source address == <value>
   - IP destination address & <mask> == <value>
   - IP protocol == TCP/UDP/ICMP/etc
   - TCP source port in_range {3000,4000}
   - TTL > N, TTL == N, TTL < N
   - IP_TOS & <mask> == <value>
   - TCP protocol = SYN
   - IP ID & <mask> == <value>
   - IP Checksum & <mask> == <value>
   - Checksum(IP header w/o TTL) & <mask> == <value>
   (note that these last 3 can be used to generate an almost uniform
    sample of the IP packets, yet they're still based on IP header)

2) selectors that pertain to a router's reaction to a particular packet.
   - egress/ingress interface this packet is routed to/from == <value> 
   - acl violations
   - failed rpf
   - failed RSVP
   - no route

3) and finally selectors that bear no relation to either the packet
   or the router's functionality, such as:
   - the next K sequential packets after a wait of N packets.
   - random sampling

All of these selectors operate in the logical space.  They do not
refer to physical bytes.  i.e. there is no facility for "the Nth byte
of the IP header == <value>".  Selectors only refer to logical fields.
Eventually the hardware/software will have to examine and compare bits,
but the selector specification is defined in the logical space and some
sort of compiler will translate the filter description into an executing
rule set in hardware or software.  This process is implementation
dependent and out of bounds of the specification. (IMHO... :)
The most general form of a type 1 selector is:
   ( <packet header field> & <mask> ) == <value>

I'm sure there's many more that can be included in each list.
But, all three types should be described in the PSAMP document.

> For the discussion on pre-filtering, the phrase "and the ways in
> which they can be combined" is key. In the framework, filtering is
> one of several packet selection mechanisms, which may be combined
> to form composite packet selectors. For example, a composite
> selector whose first component is a filter and whose second is
> 1 in N sampling.

If you think of the selectors as building blocks for the eventual
filter, then its just a matter of combining selectors in conjunctions
and unions to get a composite rule.

For example maybe I wanted to select packets which were destined for
the whitehouse in a DDOS attack.  One of the selectors would be,
  S1 := IP destination address == 63.240.15.146
  S2 := IP destination address == 63.240.15.154
So, to cover both addresses requires the union of S1 and S2.
[I know those two selectors could be a single compare under mask, but
 humor me for the sake of introducing a union in the example.]
But then you only need to look at the SYN packets, so the conjunction
of two more selectors is required.
  S3 := IP protocol == TCP
  S4 := TCP_SYNFLAG == 1
Then the rule for finding DDOS attacks on the whitehouse becomes:
  (S1 || S2) && S3 && S4.
But the that turns out to be too much data to analyze and some sort
of sampling is required to reduce the sample traffic to a acceptable
rate.  A 5th selector based on the IP header checksum provides a
reasonably uniform sampling.
  S5 := IP checksum & <mask> == <value>
And the final rule becomes:
  R1 := (S1 || S2) && S3 && S4 && S5

No doubt there would be some limit in any particular implementation
on the number of selectors and rules that can operate simultaneously.
But that's an implementation difference I'm sure the marketing
types will enjoy hyping.

Except for one case, I don't believe applying the selectors in any
particular order produces different results.  Though early out for
performance might be something a compiler could achieve, the end
result of sampled traffic remains the same.  I don't really care
what syntax is used to specify the selectors or how they are combined.
But the functionality of union and conjunctions is important.

The only place where the order of the selector rules matters is
when performing the "sample K packets every N" type selector.
This is because there is a difference in the range over which
the N is sampled.  If this type of sampler is applied first, N
ranges over the entire input stream.  But if it is applied last,
then N only applies to those packets that have managed to pass
through any previous selector functions.  Each will produce a
different sample stream.


On the matter of report contents, I agree with Derek that the
simpler the better with just forwarding the first N bytes.
A report should consist of a header followed by a number of sample
entries.  The sample report header would contain:
	1) identity of reporting agent
	2) report sequence number (to detect lost reports)
	3) agent status flags (total # of samples, alarms, etc)

A fixed sized report sample entry would consist of:
	1) rule specifier (preferably a rule id, not the full rule)
	2) timestamp
	3) first N bytes of IP packet

Whatever PSAMP comes up with, I believe it should be simple enough
to expect hardware implementations at the higher line rates.
Both the selection and the report generation processes should
have minimal overhead to allow implementations at high line rates.

I beg your pardon for being so pedantic.  But, I'm trying to
to get past the, "Six blind men describing an elephant", stage.
			my 2 cents,
			Rae McLellan



--
to unsubscribe send a message to psamp-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/psamp/>