[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
new sampling draft
- To: psamp <psamp@ops.ietf.org>
- Subject: new sampling draft
- From: Tanja Zseby <zseby@fokus.fraunhofer.de>
- Date: Mon, 03 Mar 2003 14:05:08 +0100
- User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.2) Gecko/20021120 Netscape/7.01
Hi all,
I submitted a new version of the psamp sampling draft (attached).
Main changes are:
- some text on hash-based sampling from Nick included
- introduction of complexity levels
There are still some inconsistencies in terminology and categorization.
Nick and I plan to discuss this at the IETF.
Regards
Tanja
--
Dipl.-Ing. Tanja Zseby
Fraunhofer Institute FOKUS Email: zseby@fokus.fraunhofer.de
Kaiserin-Augusta-Allee 31 Phone: +49-30-3463-7153
D-10589 Berlin, Germany Fax: +49-30-3463-8153
--------------------------------------------------------------------------------------
"Living on earth is expensive but it includes a free trip around the
sun." (Anonymous)
--------------------------------------------------------------------------------------
Internet Draft
Document: <draft-ietf-psamp-sample-tech-01.txt> T. Zseby
Expires: September 2003 Fraunhofer FOKUS
M. Molina
NEC Europe Ltd.
F. Raspall
NEC Europe Ltd.
March 2003
Sampling and Filtering Techniques for IP Packet Selection
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This document describes sampling and filtering techniques for IP
packet selection. It introduces information models for packet
sampling, for packet filtering and for combinations of methods. The
information models describe what information has to be specified in
order to describe the method. This information is used for
configuring the selection technique in measurement processes and for
reporting the technique in use to the measurement data collection
process.
The document first suggests some terminology, then it describes in
detail packet sampling and packet filtering techniques and their
parameters. It also describes how these two techniques can be
combined to build more elaborate packet selectors. Finally, it
introduces information models for the most common sampling and
filtering techniques.
Zseby, Molina, Raspall Expires April 2003 [Page 1]
Internet Draft Techniques for IP Packet Selection March 2003
Table of Contents
1. Introduction.................................................2
2. Terminology..................................................3
3. Scope and Deployment of Packet Selection Techniques..........4
3.1 Sampling.....................................................5
3.2 Filtering....................................................6
3.3 Hash-based Sampling..........................................6
3.3.1 Statistical Properties......................................6
3.3.2 Example: Trajectory Sampling................................7
3.3.3 Guarding Against Pitfalls and Vulnerabilities...............7
4. Sampling Methods.............................................8
4.1 Sampling Algorithm...........................................8
4.1.1 Systematic Sampling.........................................8
4.1.2 Random Sampling.............................................8
5. Sampling Parameters..........................................9
5.1 Parameters for systematic sampling...........................9
5.2 Parameters for random sampling...............................9
6. Complexity Levels...........................................10
7. Information Model Sampling Techniques.......................11
8. Filtering...................................................12
8.1 Filtering operating directly on some of the packet’s bits...13
8.2 Filtering considering router reaction or router state.......13
9. Information Model for Filtering Techniques..................13
10. Composite Techniques........................................16
10.1 Cascaded filtering->sampling or sampling->filtering.........16
10.2 Stratified Sampling.........................................16
11. Security Considerations.....................................17
12. Acknowledgements............................................17
13. References..................................................18
14. Author's Addresses..........................................18
15. Full Copyright Statement....................................19
1. Introduction
Increasing data rates and growing measurement demands increase the
requirements for data collection resources. For measurement
scenarios in backbone networks it is often required to measure whole
traffic aggregates instead of single flows. Furthermore, some
measurement methods require the capturing of packet headers or even
parts of the payload. All this can lead to an overwhelming amount of
measurement data, resulting in high demands regarding resources for
metering, storage, transport and post processing.
In some cases specialized hardware helps to fulfill these demands
but on the other hand increases the costs for providing the
measurement. Since measurements are mainly a supporting
functionality for the service provisioning, measurement costs
usually should be limited to a small fraction of the costs of the
network service provisioning itself. Therefore a reduction of the
measurement result data is crucial to prevent the depletion of the
available (i.e. the affordable) resources. Such a reduction can be
achieved by a reasonable deployment of packet selection techniques,
Zseby, Molina, Raspall Expires September 2003 [Page 2]
Internet Draft Techniques for IP Packet Selection March 2003
that sample a subset of the packets while still allowing an
appropriate accuracy, or filter out all packets that are not of
interest for the measurement at all. Packet selection helps to
prevent an exhaustion of resources and to limit the measurement
costs. Examples for applications that benefit from packet selection
are given in [DuGG02].
2. Terminology
IP Packet Selection Process
An IP packet selection process takes IP packets or parts of IP
packets (e.g. header) as input and extracts a subset of these
packets by applying a selection function.
Filtering
Filtering selects a subset of packets by applying deterministic
functions on parts of the packet content like header fields or
parts of the payload. A filtering process needs to process the
packet (look at packet header and/or payload) in order to make
the selection decision.
Sampling
Sampling selects a subset of packets by applying deterministic
or random functions on the (temporal or spatial) packet
position or by performing (pseudo) random calculations per
packet. This can be for example selecting every nth packet
(deterministic function on packet position) or selecting a
packet that arrives at the metering process in accordance to
the output of a random function (like flipping a coin per
packet). Sampling does not work on packet content. That means,
in contrast to filtering, a sampling process does not need to
process the packet in order to make the selection decision.
Hash function
The computation of an M bit string starting from an N bit
string. In this context, the N starting bits are some of the
bits of a packet header and/or payload.
Hash selection range
A subset of the M bit computed with a hash function for which
an Indicator Function has a value of 1.
Stream
A packet stream is the sequence of packets used as input for a
packet selection process. If multiple packet selectors are
applied subsequently, the output stream of one selector forms
the input stream for the succeeding selector. If the first
selector was a sampling process, the packets in the stream
usually do not have common properties by which they can be
distinguished from packets that not have been selected.
Therefore we define here the term stream instead of flow, which
is defined as set of packets with common properties [QuZC02].
Zseby, Molina, Raspall Expires September 2003 [Page 3]
Internet Draft Techniques for IP Packet Selection March 2003
If the term flow is used throughout the text, the flow
definition in [QuZC02] applies.
Metering process
see definition in [QuZC02]
Sample size
The sample size denotes the number of packets in the sample.
Selection function
Function that determines whether an IP packet is selected or
not.
Sampling probability
The probability with which one element is selected as part of
the sample.
Sampling ratio
The ratio between the sample size and the number of packets
composing the input stream of a packet sampling process.
3. Scope and Deployment of Packet Selection Techniques
The selection technique used to select a subset of packets out of
all those crossing an observation point depends on the purpose
(application) for which measurement is performed. If the main
purpose of an application is to infer some characteristic of the
whole set of crossing packets without processing them all (thus
reducing the computation load) then we call the used selection
technique “sampling”. In principle, with sampling the content of the
packet is not relevant for the packet selection: what matters is
only that the selected sample has a distribution of the
characteristic to infer similar to the one of the parent population,
so that it can be estimated reliably. The sampling decision may be
based on the temporal or spatial position of the packet in the
packet stream, or may depend on a (pseudo) random number extraction
or calculation.
On the contrary, if the application needs to consider all the
packets having some common property, then we call the selection
technique “filtering”. The property can be directly derived by some
computation on the packet content, or depend on the treatment given
by the router to the packet. We conclude that sampling does not
consider packet content, and can depend on packet position or on
(pseudo) random decisions, while filtering depends on packet
content, but never depends on packet position or on (pseudo) random
decisions.
Note that a common technique to select packets is to compute a hash
function on some bits of the packet header and/or content and to
select it if the result falls in a certain selection range. Since
hashing is a deterministic operation, it is a powerful mean to
ensure that the same packets are selected at multiple measurement
Zseby, Molina, Raspall Expires September 2003 [Page 4]
Internet Draft Techniques for IP Packet Selection March 2003
points. Depending on the chosen input bits, on the hash function and
on the selection range, this technique could also be used to emulate
the random selection of packets with a given probability p. Hashing
could be viewed as a particular type of filtering, but due to its
peculiarities we prefer to describe it as a separate packet
selection technique.
The introduced classification is mainly useful for the definition of
an information model describing “primitive” selection techniques.
More complex selection techniques may then be described through the
composition of cascaded sampling and filtering operations.
For example, a packet selection that weights the selection
probability on the basis of the packet length can be described as a
set of filter/sampling cascades. However, this descriptive approach
is not intended to be rigid: if a common and consolidated selection
practice turns out to be too complex to be described as a
composition of the mentioned building blocks, an ad hoc description
can be specified instead.
We consider packet selectors as part of an IPFIX metering process
which also can use the IPFIX exporting process. This is expressed as
association to one or more IPFIX processes.
3.1 Sampling
The deployment of sampling techniques aims at the provisioning of
information about a specific characteristic of the parent population
at a lower cost than a full census would demand. In order to plan a
suitable sampling strategy it is therefore crucial to determine the
needed type of information and the desired degree of accuracy in
advance.
First of all it is important to know the type of metric that should
be estimated. The metric of interest can range from simple packet
counts [JePP92] up to the estimation of whole distributions of flow
characteristics (e.g. packet sizes)[ClPB93].
Secondly, the required accuracy of the information and with this,
the confidence that is aimed at, should be known in advance. For
instance for usage-based accounting the required confidence for the
estimation of packet counters can depend on the monetary value that
corresponds to the transfer of one packet. That means that a higher
confidence could be required for expensive packet flows (e.g.
premium IP service) than for cheaper flows (e.g. best effort). The
accuracy requirements for validating a previously agreed quality can
also vary extremely with the customer demands. These requirements
are usually determined by the service level agreement (SLA).
Sampling is considered as part of the metering process. A metering
process consists of multiple functions (capturing, time stamping,
etc.). Sampling can be applied at different functions of the
metering process. In the following we consider a measured IP packet
Zseby, Molina, Raspall Expires September 2003 [Page 5]
Internet Draft Techniques for IP Packet Selection March 2003
with its observation point and timestamp as basis elements of the
parent population.
The sampling method and the parameters in use must be clearly
communicated to all applications that use the measurement data. Only
with this knowledge a correct interpretation of the measurement
results can be ensured.
3.2 Filtering
Packet filtering can be done for a wide variety of purposes e.g. for
security, SLA enforcing, accounting. Depending on the type of
filtering, it can be applied in different parts of the metering
process. The role of filtering, as the word itself suggest, is to
separate all the packets having a certain property from those not
having it. A distinguishing characteristic from sampling is that the
property never depends on the packet position in time or in the
space, or on a random process.
3.3 Hash-based Sampling
Hash-based sampling offers both a way to emulate random sampling by
using packet content to generate pseudorandom variates and a way to
consistently select subsets of packets that share a common property.
A hash function h that maps the packet content c, or some portion of
it, onto a range R. The packet is selected if h(c) is element of the
S which is a subset of R called the selection range. Thus hash-based
sampling is indeed a particular case of filtering: the object is
selected if c is in inv(h(S)). For desirable hash functions the
inverse image inv(h(S)) will be extremely complex, and hence h would
not be expressible as, say, a match/mask filter or a simple
combination of these.
3.3.1 Statistical Properties
For good pseudorandom sampling two properties are required. First,
the hash function h must have good mixing properties, in the sense
that small changes in the input (e.g. the flipping of a single bit)
cause large changes in the output (many bits change). Then any local
clump of values of c is spread widely over R by h, and so the
distribution of h(c) is fairly uniform even if the distribution of c
is not. Then the sampling rate is #S/#R, which can be tuned by
choice of S. If S and R are sets contiguous integers, h(c), suitably
shifted and normalized, can be interpreted as a pseudorandom
variate.
The second desirable property depends more closely on the statistics
of the content c. In applications, the content c comprises a number
of distinct fields, c1 ... cm, e.g. source and destination IP
Address, IP identification, and TCP/UDP port numbers (if present)
for a packet. With a hash function satisfying the first properties
Zseby, Molina, Raspall Expires September 2003 [Page 6]
Internet Draft Techniques for IP Packet Selection March 2003
above, selection decisions will appear uncorrelated with the
contents of any individual field, if the complementary fields are
(i) sufficiently variable themselves, and (ii) sufficiently
uncorrelated with cj.
3.3.2 Example: Trajectory Sampling
In trajectory sampling, all routers in a network hash-sample packets
using identical hash function and selection range. The domain of the
hash is restricted to those fields that are invariant from hop to
hop. Fields such as Time-to-Live, which is decremented per hop, and
header CRC, which is recalculated per hop, are thus excluded from
the hash domain. Thus a given packet is selected at all either all
points on its path through the network, or at none. The domain of
the hash function needs to be wider than just a flow key, if packets
are to be selected quasirandomly within flows (and e.g. include
portions of the payload); see [DuGr00]. A report on each selected
packet is exported to a collector. The collector can reconstruct
trajectories of the selected packets provided it can match different
reports on the same packet, and distinguish these from reports on
different packets. For this purpose, reports may also contain a
second distinct hash of the selected packets and/or timing
information.
Applications of trajectory sampling include (i) estimation of the
network path matrix, i.e., the traffic intensities accordng to
network path, broken down by flow; (ii) detection of routing loops,
as indicated by self-intersecting trajectories; (iii) passive
performance measurement: prematurely terminating trajectories
indicate packet loss, and packet latencies can be determined if
reports include (synchronized) timestamps; (iv) network attack
tracing, of the actual paths taken by attack packets with spoofed
source addresses.
3.3.3 Guarding Against Pitfalls and Vulnerabilities
A concern is whether some large set of related packets could be
sampled at a rate that significantly differs from the expected
sampling rate, either (i) through unanticipated behavior in the hash
function, or (ii) because the packets had been deliberately crafted
to have this property.
The first point underlines the importance of using a hash function
with good mixing properties. Examples of such are CRC32 and hash
functions based on modular arithmetic, see 6.4 in [Knuth98]. The
statistical properties of candidate hash functions need to be
evaluated, preferably on packet before adoption for hash-based
sampling.
Can hash sampling be overloaded (or evaded) if the hash function is
known? Assume an attacker, knowing h and the selection range S can
construct packets that will be sampled (or not sampled). If a
service provider keeps S private, the attacker cannot determine
whether a crafted packet will be selected. However, an attacker that
Zseby, Molina, Raspall Expires September 2003 [Page 7]
Internet Draft Techniques for IP Packet Selection March 2003
crafted a set of packets all with the same hash would know that the
packets would be either all selected or all not selected. A stronger
defense is to employ a parametrizable hash function and keep the
parameter private: in this case the set of hash values of the
packets could not be determined. Examples of parameters are the
initial vector in CRC32, and moduli in hashes based on modular
arithmetic. Another defense would be to keep the selection range
private. However, when applications (like multi domain trajectory
sampling, or One way delay estimation across multiple domains) may
require multiple administrative entities to agree on a common hash
function and selection range, mutual trust between the entities
cannot be avoided.
4. Sampling Methods
Sampling Methods can be characterized by the sampling algorithm, the
trigger type used for starting a sampling interval and the length of
the sampling interval. These parameters are described here in
detail.
4.1 Sampling Algorithm
The sampling algorithm describes the basic process for selection of
samples. In accordance to [AmCa89] and [ClPB93] we define the
following basic sampling processes:
4.1.1 Systematic Sampling
Systematic sampling describes the process of selecting the starting
points and the duration of the selection intervals according to a
deterministic function. This can be for instance the periodic
selection of every n-th element of a trace but also the selection of
all packets that arrive at pre-defined points in time. Even if the
selection process does not follow a periodic function (e.g. if the
time between the sampling intervals varies over time) we consider
this as systematic sampling as long as the selection is
deterministic. The use of systematic sampling always involves the
risk of biasing the results. If the systematics in the sampling
process resembles systematics in the observed stochastic process
(occurrence of the characteristic of interest in the network), there
is a high probability that the estimation will be biased.
Systematics (e.g. periodic repetition of an event) in the observed
process might not be known of in advance.
4.1.2 Random Sampling
Random sampling selects the starting points of the sampling
intervals in accordance to a random process. The selection of
elements are independent experiments. With this, unbiased
estimations can be achieved. In contrast to systematic sampling,
random sampling requires the generation of random numbers. One can
differentiate two methods of random sampling:
Zseby, Molina, Raspall Expires September 2003 [Page 8]
Internet Draft Techniques for IP Packet Selection March 2003
n-out-of-N sampling
In n-out-of-N sampling n elements are selected out of the parent
population that consists of N elements. One example would be to
generate random numbers and select all packets which have a packet
position equal to one of the random numbers. For this kind of
sampling the sample size is fixed.
Probabilistic sampling (see also [DuGG02])
In probabilistic sampling the decision whether an element is
selected or not is made in accordance to a pre-defined selection
probability. An example would be to flip a coin for each packet and
select all packets for which the coin showed the head. For this kind
of sampling the sample size can vary for different trials. The
selection probability is not necessarily the same for each packet.
5. Sampling Parameters
The decision whether to select a packet or not is based on a
function which is performed when the packet arrives at the sampling
process. The sampling function can consist of a (pseudo) random
calculation or of a function that take the packet position (temporal
or spatial) into account. The parameters of these functions that are
not derived from the packet are called sampling parameters.
5.1 Parameters for systematic sampling
For systematic sampling the deterministic function which is used for
the packet selection needs to be given. For periodic sampling the
start of the first selection interval, the length of the selection
interval (given in number of packets or as time duration) and the
spacing between selection intervals needs to be specified.
<-- interval length = 7 --> <-- spacing = 5 _->
Packet position: 1 2 3 4 5 6 7 8 9 10 11 12 13..
The packets in the sample will be: 1,2,3,4,5,6,7, 13,...
Selecting every x-th packet would be a special case with selection
interval=1 and spacing=x-1.
5.2 Parameters for random sampling
For random n-out-of-N sampling only the sample size n needs to be
specified. This can be done either as an absolute number or as
fraction of the parent population n/N.
For probabilistic sampling the selection probability p needs to be
specified. If the selection probability depends on other parameters
(e.g. packet content), the function that expresses this dependency
has to be specified.
Zseby, Molina, Raspall Expires September 2003 [Page 9]
Internet Draft Techniques for IP Packet Selection March 2003
6. Complexity Levels
Packet selection schemes differ in the input parameters for the
selection process and the functions they require to do the packet
selection. The following table gives an overview.
Scheme | input parameters | functions
---------------+------------------------+---------------------
simple | sampling | random function
probabilistic | probability |
---------------+------------------------+---------------------
systematic | packet position | packet counter
count-based | sampling pattern |
---------------+------------------------+---------------------
systematic | arrival time | clock or timer
time-based | sampling pattern |
---------------+------------------------+---------------------
random | packet position | packet counter,
n-out-of-N | sampling pattern | random numbers
| (random number list) |
---------------+------------------------+---------------------
hash-based | packet content(parts) | hash function
---------------+------------------------+---------------------
filtering | packet content(parts) | filter function
---------------+------------------------+---------------------
content-based | packet content(parts) | selection function,
probabilistic | | probability calc.
---------------+------------------------+---------------------
router state | router state | router state
| | discovery
---------------+------------------------+---------------------
The sampling pattern determines which packets have to be selected in
schemes that are not based on probabilistic sampling. For systematic
count-based sampling this is the length of the sampling interval and
the spacing between sampling intervals expressed in number of
packets. For systematic time-based sampling this is the length of
the sampling interval and the spacing between sampling intervals
expressed as time intervals. For random n-out-of-N sampling this
pattern is based e.g. on a list of random numbers. The parameters
and function needed for combined schemes depend on the combination.
In content-based probabilistic sampling, the sampling probability
depends on the content. This can be used to achieve a biased
selection of packets.
In order to allow different types of devices to implement schemes in
accordance to their capabilities and available resources we group
the schemes into the following complexity levels.
Zseby, Molina, Raspall Expires September 2003 [Page 10]
Internet Draft Techniques for IP Packet Selection March 2003
Complexity level 1: Devices that comply to PSAMP must at least
support the following simple packets selection functions:
- Simple probabilistic
- systematic count-based
Complexity level 2: Devices that comply to PSAMP should support the
following packets selection functions:
- n-out-of-N
- hash-based
- filtering
- content-based probabilistic
- systematic time-based
Complexity level 3: Devices that comply to PSAMP may support the
following packets selection functions:
- router-state-based
- combined schemes
7. Information Model Sampling Techniques
In this section we define the information models for most common
sampling techniques. Here the selection function is pre-defined and
given by the selector ID.
Sampler Description:
SELECTOR_ID
SELECTOR_TYPE
SELECTOR_PARAMETERS
OPERATING_TIME
ASSOCIATIONS
Where:
SELECTOR_ID:
Unique ID for the packet sampler. The ID can be calculated under
consideration of the ASSOCIATIONS and a local ID.
SELECTOR_TYPE
Description: For sampling processes the SELECTOR TYPE defines what
sampling algorithm is used.
Values: n out of N | Systematic Time Based (equally spaced)|
Systematic Position Based (equally spaced)| Probabilistic
[Remark: further sampling schemes will be added here]
SELECTOR_PARAMETERS
Description: For sampling processes the SELECTOR PARAMETERS define
the input parameters for the process. Interval length in systematic
sampling means, that all packets that arrive in this interval are
selected. The spacing parameter defines the spacing in time or
Zseby, Molina, Raspall Expires September 2003 [Page 11]
Internet Draft Techniques for IP Packet Selection March 2003
number of packets between the end of one sampling interval and the
start of the next succeeding interval.
Case n out of N:
- List of n sampling positions in an array of N positions
Case Systematic Time Based:
- Interval length (in usec), Spacing (in usec)
Case Systematic Position Based:
- Interval length(in packets), Spacing (in packets)
Case Probabilistic(with equal probability per packet):
- Sampling probability p
OPERATING_TIME
Description: The OPERATING_TIME parameter describes the start/stop
time of sampling process. List elements must not overlap. The start
time of the first element can be omitted, meaning “from now”. The
end time of the last element can be omitted, meaning “until sampler
is removed”.
Values: List of (Start time, End time)
ASSOCIATIONS
Description: The ASSOCIATIONS field describes the observation point
and the IPFIX processes to which the packet selector is associated.
The STREAM ID denotes the origin of the data stream that is input to
the selection function. It can be the observation point directly or
the ID of another selector. With this it is possible to define
combined schemes. If the STREAM ID contains IDs from other
selectors, one can derive the original observation point from the
selector definitions of these specified selectors.
Values: <STREAM ID, Metering process ID, Exporting process ID>
With STREAM ID: Observation point ID | List of SELECTOR_IDs
8. Filtering
As pointed out in section 3, the main difference between sampling
and filtering is that filtering never depends on the temporal or
spatial position of packets. We introduce two classes of filters. In
the first one, the property can be directly derived by applying a
function on some bits of the packet, while in the second one the
property depends on router state or on the router’s reaction to a
particular packet.
The filters of the first class should be able to operate at full
line rate, while some of the ones of the second may need to be
preceded by a sampling function (e.g. because they involve access to
router state).
[Discussion needed on router-state based filtering]
Zseby, Molina, Raspall Expires September 2003 [Page 12]
Internet Draft Techniques for IP Packet Selection March 2003
8.1 Filtering operating directly on some of the packet’s bits
These filters functionally operate as follow:
- They select some bits of the packet (not or not only
necessarily those of the header).
- They apply a function on the selected bits. The function can be
as simple as the identity function (i.e. this step is logically
skipped), or as complex as a hash function.
- They feed the result of the function into an indicator
function, that returns a “select/do not select” result.
Examples of filters of this class are filters that select packets on
the basis of the matching of some of the header fields with a
(possibly masked) pre defined value, filters that select the packets
that have some header field value falling within a predefined range,
or filters that select some header fields and/or a portion of the
payload, apply a hash function and then select the packet if the
results is in the hash selection range. Note that in the latter
case, the selected bits may not be the only one forming the input of
the hash function. For example, a “secret” bit sequence could be
appended to the selected bits in order to make it harder for an
attacker to forge packets being either always or never selected.
An implementation isn’t constrained to apply exactly all these steps
or in this sequence, provided that the result is equivalent to a
logical function doing it.
8.2 Filtering considering router reaction or router state
This class of filters select a packet on the basis of the following
conditions), possibly combined with the AND, OR or NOT operators.
- Ingress interface at which packet arrives equals a specified
value
- Egress interface to which packet is routed to equals a
specified value
- Packet violated acl on the router
- Failed rpf
- Failed rsvp
- No route found for the packet
- Origin AS equals a specified value or lies within a given
range
- Destination AS equals a specified value or lies within a given
range
9. Information Model for Filtering Techniques
In this section we define the information models for most common
filtering techniques. The information model structure closely
parallels the one presented for the sampling techniques.
Filter Description:
Zseby, Molina, Raspall Expires September 2003 [Page 13]
Internet Draft Techniques for IP Packet Selection March 2003
SELECTOR_ID
SELECTOR_TYPE
SELECTOR_PARAMETERS
OPERATING_TIME
ASSOCIATIONS
Where:
SELECTOR_ID:
Unique ID for the packet filter. The ID can be calculated under
consideration of the ASSOCIATIONS and a local ID.
SELECTOR_TYPE
Description: For filtering processes the SELECTOR TYPE defines what
filtering type is used.
Values: Matching | Hashing | Router_state
SELECTOR_PARAMETERS
Description: For filtering processes the SELECTOR PARAMETERS define
formally the common property of the packet being filtered. For the
filters of type Matching and Hashing the definitions have a lot of
points in common.
Values:
Case Matching
- <Header type = ip v4 >
- <bit specification, header part>
- <Selection interval specification, header part>
- <Header type = ipv6>
- <bit specification, header part>
- <Selection interval specification, header part>
- <payload byte number N>
- <bit specification, payload part>
- <Selection interval specification, payload part>
Notes to Case Matching:
- The filter can be defined for the header part only, for the
payload part only or for both. In the latter case the matching
must be an AND of the two.
- The bit specification, for the header part, can be specified
for ipv4 or ipv6 only, or both
- In case of ipv4, the bit specification is a sequence of 20
Hexadecimal numbers [00,FF] specifying a 20 bytes bitmask to be
applied to the header
- In case of ipv6, it is a sequence of 40 Hexadecimal numbers
[00,FF] specifying a 40 bytes bitmask to be applied to the
header
- The bit specification, for the payload part, is a sequence of
Hexadecimal numbers [00,FF] specifying the bitmask to be
applied to the first N bytes of the payload, as specified by
the previous field. In case the Hexadecimal number sequence is
longer then N, only the first N numbers are considered.
Zseby, Molina, Raspall Expires September 2003 [Page 14]
Internet Draft Techniques for IP Packet Selection March 2003
- In case the payload is shorter than N, the packet will not
match the filter Other options, like padding with zeros, may be
considered in the future.
- The selection interval specification is a list of non
overlapping intervals [intv_begin, intv_end] where intv_begin,
intv_end are bit strings of length 20*8 (ipv4 case), 40*8 (ipv6
case), N*8 (payload case).
- A filter cannot be defined on the options field of the ipv4
header, neither on stacked headers of ipv6.
- This specification doesn’t preclude the future definition of a
high level syntax for defining in a concise way bit selection
and matching rules in a more human readable form (e.g. “TCP
port in [2000,3000]”). The requirement is that such a syntax
can be univoquely compiled into the one defined above
Case Hashing:
- <Header type = ipv4>
- <Input bit specification, header part>
- <Header type = ipv6>
- <Input bit specification, header part>
- <payload byte number N>
- <Input bit specification, payload part>
- <additional hash input bits (seed)>
- Hashing function specification (includes length of hash
function output M)
- Selection interval specification, as a list of non overlapping
intervals [start value, end value] where value is in [0,2^M-1]
Notes to Case Hashing:
- On Input bit specifications fields, the same notes on bit
specifications of the Matching case reported above apply
Case Router State:
- Ingress interface at which packet arrives equals a specified
value
- Egress interface to which packet is routed to equals a
specified value
- Packet violated acl on the router
- Failed rpf
- Failed rsvp
- No route found for the packet
- Origin AS equals a specified value or lies within a given
range
- Destination AS equals a specified value or lies within a given
range
Note to Case Router State:
- All Router state entries can be linked by AND, OR, NOT
operators
OPERATING_TIME
Zseby, Molina, Raspall Expires September 2003 [Page 15]
Internet Draft Techniques for IP Packet Selection March 2003
Description: The OPERATING_TIME parameter describes the start/stop
time of filtering process. List elements must not overlap. The start
time of the first element can be omitted, meaning “from now”. The
end time of the last element can be omitted, meaning “until sampler
is removed”.
Values: List of (Start time, End time)
ASSOCIATIONS
Description: The ASSOCIATIONS field describes the observation point
and the IPFIX processes to which the packet selector is associated.
The STREAM ID denotes the origin of the data stream that is input to
the selection function. It can be the observation point directly or
the ID of another selector. With this it is possible to define
combined schemes. If the STREAM ID contains IDs from other
selectors, one can derive the original observation point from the
selector definitions of these specified selectors.
Values: STREAM ID, Metering process ID, Exporting process ID>
With STREAM ID: Observation point ID | List of SELECTOR_IDs
10. Composite Techniques
Composite schemes are realized by using the STREAM ID in the
information models. The STREAM ID denotes from which selectors the
input stream originates. If multiple stream IDs are given, this
means that the selector operates on the packet stream simply
resulting from the time superposition of the output of all the
listed filters and samplers. Note that a sampler/filter could be
intermittently active, as defined in the OPERATING TIME field.
Some examples of composite schemes are reported below.
10.1 Cascaded filtering->sampling or sampling->filtering
If a filter precedes a sampling process the role of filtering is to
create a set of “parent populations” from a single stream that can
then be fed independently to different sampling functions, with
different parameters tuned for the population itself (e.g. if
streams of different intensity result from filtering, it may be good
to have different sampling rates). If filtering follows a sampling
process, the same sampling rate and type is applied to the whole
stream, independently of the relative size of the streams resulting
from the filtering function. Moreover, also packets not destined to
be selected will “load” the sampling function. So, in principle,
filtering before sampling allows a more accurate tuning of the
sampling procedure, but if filters are too complex to work at full
line rate (e.g. because they have to access router state
information), sampling before filtering may be a need.
10.2 Stratified Sampling
Stratified sampling is one example for using a composite technique.
The basic idea behind stratified sampling is to increase the
estimation accuracy by using a-priori information. The a-priori
Zseby, Molina, Raspall Expires September 2003 [Page 16]
Internet Draft Techniques for IP Packet Selection March 2003
information is used to perform an intelligent grouping of the
elements of the parent population. With this a higher estimation
accuracy can be achieved with the same sample size.
Stratified sampling divides the sampling process into multiple
steps. First, the elements of the parent population are grouped into
subsets in accordance to a given characteristic. This grouping can
be done in multiple steps. Then samples are taken from each subset.
The stronger the correlation between the characteristic used to
divide the parent population and the characteristic of interest (for
which an estimate is sought after), the easier is the consecutive
sampling process and the higher is the stratification gain. For
instance if the dividing characteristic were equal to the
investigated characteristic, each element of the sub-group would be
a perfect representative of that characteristic. In this case it
would be sufficient to take one arbitrary element out of each
subgroup to get the actual distribution of the characteristic in the
parent population. Therefore stratified sampling can reduce the
costs for the sampling process (i.e. the number of samples needed to
achieve a given level of confidence).
For stratified sampling one has to specify classification rules for
grouping the elements into subgroups and the sampling scheme that is
used within the subgroups. The classification rules can be expressed
by multiple filters. For the sampling scheme within the subgroups
the parameters have to be specified as described above.
11. Security Considerations
Security threats can occur if the configuration of sampling
parameters or the communication of sampling parameters to the
application is corrupted. This document only describes sampling
schemes that can be used for packet selection. It neither describes
a mechanism how those parameters are configured nor how these
parameters are communicated to the application. Therefore the
security threats that originate from this kind of communication
cannot be assessed with the information given in this document.
In some cases malicious users or attackers may be interested to hide
packets from the service provider. For instance if packet selectors
are used for accounting or intrusion detection applications, users
may want to prevent that packets are selected. If a deterministic
sampling scheme is used or a selection scheme that takes packet
content into account, the user can shape or send packets in a way
that they are less likely to be selected. This has to be taken into
account when choosing an appropriate packet selection technique.
12. Acknowledgements
We would like to thank Nick Duffield for providing some text on
hash-based sampling.
Zseby, Molina, Raspall Expires September 2003 [Page 17]
Internet Draft Techniques for IP Packet Selection March 2003
13. References
[AmCa89] Paul D. Amer, Lillian N. Cassel: Management of Sampled
Real-Time Network Measurements, 14th Conference on Local
Computer Networks, October 1989, Minneapolis, pages 62-
68, IEEE, 1989
[ClPB93] K.C. Claffy, George C Polyzos, Hans-Werner Braun:
Application of Sampling Methodologies to Network Traffic
Characterization, Proceedings of ACM SIGCOMM'93, San
Francisco, CA, USA, September 13 - 17, 1993
[CoGi98] I. Cozzani, S. Giordano: Traffic Sampling Methods for
end-to-end QoS Evaluation in Large Heterogeneous
Networks. Computer Networks and ISDN Systems, 30 (16-
18), September 1998.
[DuGG02] Nick Duffield, Albert Greenberg, Matthias Grossglauser,
Jennifer Rexford: A Framework for Passive Packet
Measurement, Internet Draft draft-duffield-framework-
papame-01, work in progress, February 2002
[DuGr00] Nick Duffield, Matthias Grossglauser: Trajectory
Sampling for Direct Traffic Observation, Proceedings of
ACM SIGCOMM 2000, Stockholm, Sweden, August 28 -
September 1, 2000.
[JePP92] Jonathan Jedwab, Peter Phaal, Bob Pinna: Traffic
Estimation for the Largest Sources on a Network, Using
Packet Sampling with Limited Storage, HP technical
report, Managemenr, Mathematics and Security Department,
HP Laboratories, Bristol, March 1992,
http://www.hpl.hp.com/techreports/92/HPL-92-35.html
[Knuth98] Donald E. Knuth: The Art of Computer Programming, Volume
3: Searching and Sorting, Addison Wesley, 1998
[QuZC02] J. Quittek, T. Zseby, B. Claise, S. Zander, G. Carle,
K.C. Norseth: Requirements for IP Flow Information
Export, Internet Draft <draft-ietf-ipfix-reqs-05.txt>,
work in progress, August 2002
[Zseb02] Tanja Zseby: Deployment of Sampling Methods for SLA
Validation with Non-Intrusive Measurements, Proceedings
of Passive and Active Measurement Workshop (PAM 2002),
Fort Collins, CO, USA, March 25-26, 2002
14. Author's Addresses
Tanja Zseby
Fraunhofer Institute for Open Communication Systems
Kaiserin-Augusta-Allee 31
10589 Berlin
Germany
Zseby, Molina, Raspall Expires September 2003 [Page 18]
Internet Draft Techniques for IP Packet Selection March 2003
Phone: +49-30-34 63 7153
Fax: +49-30-34 53 8153
Email: zseby@fokus.fhg.de
Maurizio Molina
NEC Europe Ltd., Network Laboratories
Adenauerplatz 6
69115 Heidelberg
Germany
Phone: +49 6221 90511-18
Email: molina@ccrle.nec.de
Fredric Raspall
NEC Europe Ltd., Network Laboratories
Adenauerplatz 6
69115 Heidelberg
Germany
Phone: +49 6221 90511-31
EMail: raspall@ccrle.nec.de
15. Full Copyright Statement
Copyright (C) The Internet Society (2002). All Rights Reserved. This
document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph
are included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Zseby, Molina, Raspall Expires September 2003 [Page 19]