[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
new version of packet selection draft
- To: psamp <psamp@ops.ietf.org>
- Subject: new version of packet selection draft
- From: Tanja Zseby <zseby@fokus.fraunhofer.de>
- Date: Fri, 27 Jun 2003 12:24:28 +0200
- User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.2) Gecko/20021120 Netscape/7.01
Hi all,
I just submitted our new version of the packet selection draft (joint
work with Maurizio, Nick and Fredric). The draft is attached. Main
changes are:
- consistency with framework draft (terminology, categorization of schemes)
- reworked sections on description of sampling and filtering schemes
- complexity levels removed
- some restructuring of the document
Regards
Tanja
--
Dipl.-Ing. Tanja Zseby
Fraunhofer Institute FOKUS Email: zseby@fokus.fraunhofer.de
Kaiserin-Augusta-Allee 31 Phone: +49-30-3463-7153
D-10589 Berlin, Germany Fax: +49-30-3463-8153
--------------------------------------------------------------------------------------
"Living on earth is expensive but it includes a free trip around the sun." (Anonymous)
--------------------------------------------------------------------------------------
Internet Draft
Document: <draft-ietf-psamp-sample-tech-02.txt> T. Zseby
Expires: December 2003 Fraunhofer FOKUS
M. Molina
NEC Europe Ltd.
F. Raspall
NEC Europe Ltd.
Nick Duffield
AT&T Labs - Research
June 2003
Sampling and Filtering Techniques for IP Packet Selection
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This document describes sampling and filtering techniques for IP
packet selection. It introduces information models for packet
sampling, for packet filtering and for combinations of methods. The
information models describe what information has to be specified in
order to describe the method. This information is used for
configuring the selection technique in measurement processes and for
reporting the technique in use to the measurement data collection
process.
The document first suggests some terminology, then it describes in
detail packet sampling and packet filtering techniques and their
parameters. It also describes how these two techniques can be
combined to build more elaborate packet selectors. Finally, it
introduces information models for the most common sampling and
filtering techniques.
Table of Contents
1. Introduction.................................................2
2. Terminology..................................................2
3. Scope and Deployment of Packet Selection Techniques..........4
3.1 Sampling.....................................................5
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 1]
Internet Draft Techniques for IP Packet Selection June 2003
3.1.1 Systematic Sampling.........................................5
3.1.2 Random Sampling.............................................6
3.1.2.1 n-out-of-N sampling.......................................6
3.1.2.2 Probabilistic sampling....................................6
3.1.2.2.1 Uniform Probabilistic Sampling..........................7
3.1.2.2.2 Non-Uniform Probabilistic Sampling......................7
3.1.2.2.3 Non-Uniform flow State dependent sampling...............7
4. Filtering....................................................7
4.1 Mask/Match filtering.........................................8
4.2 Hashing filtering............................................8
4.2.1 Random sampling emulation...................................9
4.2.2 Consistent packet selection and its applications............9
4.2.3 Guarding Against Pitfalls and Vulnerabilities..............10
4.3 Router State filtering......................................10
5. Input Parameters and Information Models.....................11
5.1 Information Model for Sampling Techniques...................11
5.2 Information Model for Filtering Techniques..................13
6. Composite Techniques........................................15
6.1 Cascaded filtering->sampling or sampling->filtering.........15
6.2 Stratified Sampling.........................................15
7. Security Considerations.....................................16
8. References..................................................16
9. Author's Addresses..........................................17
10. Intellectual Property Statement.............................18
11. Full Copyright Statement....................................18
1. Introduction
Increasing data rates and growing measurement demands increase the
requirements for data collection resources. For measurement scenarios
in backbone networks it is often required to measure whole traffic
aggregates instead of single flows. Furthermore, some measurement
methods require the capturing of packet headers or even parts of the
payload. All this can lead to an overwhelming amount of measurement
data, resulting in high demands regarding resources for metering,
storage, transport and post processing.
In some cases specialized hardware helps to fulfill these demands but
on the other hand increases the costs for providing the measurement.
Since measurements are mainly a supporting functionality for the
service provisioning, measurement costs usually should be limited to
a small fraction of the costs of the network service provisioning
itself. Therefore a reduction of the measurement result data is
crucial to prevent the depletion of the available (i.e. the
affordable) resources. Such a reduction can be achieved by a
reasonable deployment of packet selection techniques, that sample a
subset of the packets while still allowing an appropriate accuracy,
or filter out all packets that are not of interest for the
measurement at all. Packet selection helps to prevent an exhaustion
of resources and to limit the measurement costs. Examples for
applications that benefit from packet selection are given in
[DuGG03].
2. Terminology
The terms used throughout this document are defined in [DuGG03]. We
here just repeat the definitions of the specific selection
operations.
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 2]
Internet Draft Techniques for IP Packet Selection June 2003
Filtering: a filter is a selection operation that selects a packet
deterministically based on the packet content, its treatment, and
functions of these occurring in the selection state. Examples include
match/mask filtering, and hash-based selection.
Sampling: a selection operation that is not a filter is called a
sampling operation.
Content-independent Sampling: a sampling operation that does not use
packet content (or quantities derived from it) as the basis for
selection is called a content-independent sampling operation.
Examples include systematic sampling, and uniform pseudorandom
sampling driven by a pseudorandom number whose generation is
independent of packet content. Note that independent sampling a does
not need to access the packet content in order to make the selection
decision.
Content-dependent Sampling: a sampling operation where selection is
dependent on packet content is called a content-dependent sampling
operation. Examples include pseudorandom selection according to a
probability that depends on the contents of a packet field; note that
this is not a filter.
Emulated Sampling: selection operations in any of the above four
categories may be emulated by operations in the same or another
category for the purposes of implementation.
Hash-based selection: a filter specified by a hash domain, a hash
function, and hash range and a hash selection range.
Hash domain: a subset of the packet content and the packet treatment,
viewed as a N-bit string for some positive integer N.
Hash range: a set of M-bit strings for some positive integer M
Hash function: a deterministic map from the hash domain into the
hash range.
Hash selection range: a subset of the hash range. The packet is
selected if the action of the hash function on the hash domain for
the packet yields a result in the hash selection range.
Metering process: see the definition in [QuZC03]
Pool size: the size of a set of packets in a packet stream.
Sample size: the size of a set of packets selected by a sampling
operation.
Target Sampling Rate: a configurable sampling rate in a sampling
operation.
Attained Sampling Rate: Given a subset of packets in a stream input
to a sampling operation, the attained sampling rate is the ratio of
the sample size to the pool size.
Packet Stream: a sequence of packets, each of which was observed at
the observation point. Note that when packets are sampled from a
stream, the selected packets usually do not have common properties by
which they can be distinguished from packets that have not been
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 3]
Internet Draft Techniques for IP Packet Selection June 2003
selected. Therefore we define here the term stream instead of flow,
which is defined as set of packets with common properties [QuZC02].
Selection Process: a selection process selects a substream of packets
from the observed packet stream. A selection process entails the
composition of one or more selectors operating in succession. If
selectors are composed, the output packet stream of one selector
forms the input packet stream for the succeeding selector.
3. Scope and Deployment of Packet Selection Techniques
The function of packet selection is to select a subset from the
stream of all packets visible at an observation point. Selection can
be used to select packets based on their content, and/or to reduce
the rate of packet reports regardless of content.
The selection technique used to select a subset of packets out of all
those crossing an observation point depends on the purpose
(application) for which measurement is performed. If the main purpose
of an application is to infer some characteristic of the whole set of
crossing packets without processing them all (thus reducing the
computation load) then we call the used selection technique
“sampling”. In principle, with sampling the content of the packet is
not relevant for the packet selection: what matters is only that the
selected sample has a distribution of the characteristic to infer
similar to the one of the parent population, so that it can be
estimated reliably. The sampling decision may be based on the
temporal or spatial position of the packet in the packet stream, or
may depend on a (pseudo) random number extraction or calculation.
On the contrary, if the application needs to consider all the packets
having some common property, then we call the selection technique
“filtering”. The property can be directly derived by some computation
on the packet content, or depend on the treatment given by the router
to the packet. We conclude that sampling can depend on packet
position or on (pseudo) random decisions, while filtering depends on
packet content, but never depends on packet position or on (pseudo)
random decisions.
Note that a common technique to select packets is to compute a hash
function on some bits of the packet header and/or content and to
select it if the result falls in a certain selection range. Since
hashing is a deterministic operation, it is a powerful mean to ensure
that the same packets are selected at multiple measurement points.
Depending on the chosen input bits, on the hash function and on the
selection range, this technique could also be used to emulate the
random selection of packets with a given probability p. Hashing is
then a particular type of filtering, but can also be used to emulate
random sampling.
The introduced classification is mainly useful for the definition of
an information model describing “primitive” selection techniques.
More complex selection techniques may then be described through the
composition of cascaded sampling and filtering operations.
For example, a packet selection that weights the selection
probability on the basis of the packet length can be described as a
set of filter/sampling cascades. However, this descriptive approach
is not intended to be rigid: if a common and consolidated selection
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 4]
Internet Draft Techniques for IP Packet Selection June 2003
practice turns out to be too complex to be described as a composition
of the mentioned building blocks, an ad hoc description can be
specified instead.
We consider packet selectors as part of an IPFIX metering process
which also can use the IPFIX exporting process. This is expressed as
association to one or more IPFIX processes.
3.1 Sampling
The deployment of sampling techniques aims at the provisioning of
information about a specific characteristic of the parent population
at a lower cost than a full census would demand. In order to plan a
suitable sampling strategy it is therefore crucial to determine the
needed type of information and the desired degree of accuracy in
advance.
First of all it is important to know the type of metric that should
be estimated. The metric of interest can range from simple packet
counts [JePP92] up to the estimation of whole distributions of flow
characteristics (e.g. packet sizes)[ClPB93].
Secondly, the required accuracy of the information and with this, the
confidence that is aimed at, should be known in advance. For instance
for usage-based accounting the required confidence for the estimation
of packet counters can depend on the monetary value that corresponds
to the transfer of one packet. That means that a higher confidence
could be required for expensive packet flows (e.g. premium IP
service) than for cheaper flows (e.g. best effort). The accuracy
requirements for validating a previously agreed quality can also vary
extremely with the customer demands. These requirements are usually
determined by the service level agreement (SLA).
Sampling is considered as part of the metering process. A metering
process consists of multiple functions (capturing, time stamping,
etc.). Sampling can be applied at different functions of the metering
process. In the following we consider a measured IP packet with its
observation point and timestamp as basis elements of the parent
population.
The sampling method and the parameters in use must be clearly
communicated to all applications that use the measurement data. Only
with this knowledge a correct interpretation of the measurement
results can be ensured.
Sampling Methods can be characterized by the sampling algorithm, the
trigger type used for starting a sampling interval and the length of
the sampling interval. These parameters are described here in detail.
The sampling algorithm describes the basic process for selection of
samples. In accordance to [AmCa89] and [ClPB93] we define the
following basic sampling processes:
3.1.1 Systematic Sampling
Systematic sampling describes the process of selecting the starting
points and the duration of the selection intervals according to a
deterministic function. This can be for instance the periodic
selection of every n-th element of a trace but also the selection of
all packets that arrive at pre-defined points in time. Even if the
selection process does not follow a periodic function (e.g. if the
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 5]
Internet Draft Techniques for IP Packet Selection June 2003
time between the sampling intervals varies over time) we consider
this as systematic sampling as long as the selection is
deterministic.
The use of systematic sampling always involves the risk of biasing
the results. If the systematics in the sampling process resembles
systematics in the observed stochastic process (occurrence of the
characteristic of interest in the network), there is a high
probability that the estimation will be biased. Systematics (e.g.
periodic repetition of an event) in the observed process might not be
known in advance.
Here only equally spaced schemes are considered, where triggers for
sampling are periodic, either in time or in packet count. All packets
occurring in a selection interval (either in time or packet count)
beyond the trigger are selected. The case that the selection interval
covers only the first available packet for count-based sampling is
often called 1 in N sampling: packets are selected with count period
N. More generally, some number M<N of consecutive packets are
selected.
Systematic count-based
In systematic count-based sampling the start and stop triggers for
the sampling interval are defined in accordance to the special packet
position (packet count).
Systematic time-based
In systematic count-based sampling the start and stop triggers for
the sampling interval are defined in accordance to the temporal
packet position (arrival time).
Both schemes are content–independent selection schemes.
3.1.2 Random Sampling
Random sampling selects the starting points of the sampling intervals
in accordance to a random process. The selection of elements are
independent experiments. With this, unbiased estimations can be
achieved. In contrast to systematic sampling, random sampling
requires the generation of random numbers. One can differentiate two
methods of random sampling:
3.1.2.1 n-out-of-N sampling
In n-out-of-N sampling n elements are selected out of the parent
population that consists of N elements. One example would be to
generate random numbers and select all packets which have a packet
position equal to one of the random numbers. For this kind of
sampling the sample size is fixed.
3.1.2.2 Probabilistic sampling
In probabilistic sampling the decision whether an element is selected
or not is made in accordance to a pre-defined selection probability.
An example would be to flip a coin for each packet and select all
packets for which the coin showed the head. For this kind of sampling
the sample size can vary for different trials. The selection
probability is not necessarily the same for each packet. We
distinguish between uniform probabilistic sampling and non-uniform
probabilistic sampling.
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 6]
Internet Draft Techniques for IP Packet Selection June 2003
3.1.2.2.1 Uniform Probabilistic Sampling
For Uniform Random Sampling packets are selected independently with
some uniform probability 1/N. This sampling can be count-driven, and
is sometimes referred to as geometric random sampling, since the
difference in count between successive selected packets are
independent random variables with a geometric distribution of mean N.
A time-driven analog, exponential random sampling, has the time
between triggers exponentially distributed.
Both geometric and exponential random sampling are examples of what
is known as additive random sampling, defined as sampling where the
intervals or counts between successive samples are independent
identically distributed random variable.
3.1.2.2.2 Non-Uniform Probabilistic Sampling
Also known as non-uniform probability sampling, this is a variant of
independent random sampling in which the sampling probabilities can
depend on the selection process input. This can be used to weight
sampling probabilities in order e.g. to boost the chance of sampling
packets that are rare but are deemed important. Unbiased estimators
for quantitative statistics are recovered by renormalization of
sample values; see [HT52].
3.1.2.2.3 Non-Uniform flow State dependent sampling
Another type of sampling that can be classified as Non-Uniform (and,
possibly, probabilistic) is closely related to the flow concept as
defined in [QuZC02], and it is only used jointly with a flow
monitoring function (IPFIX monitoring function). Packets are
selected, probabilistically or deterministically, dependent on a
selection state. The point, here, is that the selection state is
determined also by the state of the flow the packet belongs to and/or
by the state of the other flows currently being monitored by the
associated flow monitoring function. This type of sampling is also
content dependent because the identification of the flow the packet
belongs to requires analyzing part of the packet content. If the
packet is selected, then it is passed as an input to the IPFIX
monitoring function (this is called “Local Export” in [DuGG03]
Selecting the packet depending on the state of a flow cache is useful
when memory resources of the flow monitoring function are scarce
(i.e. there’s no room to keep all the flows that have been scheduled
for monitoring). The algorithms leading to the selection decision are
not subject to standardization, but they may take into account
factors like flow volume, flow grow rate, flow monitoring priority.
An example of such an algorithm is described in [EsVa01]
n-out-of-N sampling and uniform probabilistic sampling are content–
independent selection schemes. For non-uniform probabilistic sampling
the sampling probability can be based on packet content.
4. Filtering
Filtering is the selection of packets based only the packet content,
the treatment of the packet at the observation point, and
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 7]
Internet Draft Techniques for IP Packet Selection June 2003
deterministic functions of these occurring in the selection state.
The packet is selected if these quantities fall into a specified
range.
Packet filtering can be done for a wide variety of purposes e.g. for
security, SLA enforcing, accounting. Depending on the type of
filtering, it can be applied in different parts of the metering
process. The role of filtering, as the word itself suggest, is to
separate all the packets having a certain property from those not
having it. A distinguishing characteristic from sampling is that the
property never depends on the packet position in time or in the
space, or on a random process.
We identify and describe in the following three filtering techniques.
The first two (Mask/Match filtering and Hashing filtering) are
stateless, and therefore can make their decision based on the
analysis of portion of the packet only. The other (router state
filtering) requires to access state information after the analysis of
part of the packet and is therefore more complex: its usage makes
sense only in particular circumstances, as described below.
4.1 Mask/Match filtering
This type of filtering selects a packet operating as follows: first a
combination of packet’s bit positions is selected taking the logical
AND of portion of the packet’s bits and a mask, then it’s checked if
the result, seen as an hexadecimal number, falls within a selection
range. In the simplest case the selection range can be a single value
(e.g. filter selects only the packets with a certain specific header
field, e.g. a specific source IP address) but more in general the
selection range can be a sequence of non-overlapping intervals if
high level interfaces are used to specify mask and matches. The
selected bits of the packet aren’t necessarily only those of the IP
header, but in case both bits of the IP header and the payload are
selected, the masks and the selection intervals MUST be specified
separately for the header and the payload.
An implementation is not constrained to apply exactly all the steps
or in this sequence, provided that the result is equivalent to a
logical function doing it.
Examples of filters of this class are filters that select packets
based on the matching of some of the IP header fields with a
(possibly masked) specific value, filters that select packets having
some IP header fields values falling within a range, filters that do
the same as above on some of the transport header fields (that are
thus considered as part of the payload), or a combination of all of
the above mentioned possibilities.
4.2 Hashing filtering
A hash function h maps the packet content c, or some portion of it,
onto a range R. The packet is selected if h(c) is an element of S,
which is a subset of R called the “selection range”. Thus hash-based
sampling is indeed a particular case of filtering: the object is
selected if c is in inv(h(S)). But for desirable hash functions the
inverse image inv(h(S)) will be extremely complex, and hence h would
not be expressible as, say, a match/mask filter or a simple
combination of these.
Hash-based sampling has mainly two types of usage: it offers a way to
emulate random sampling by using packet content to generate
pseudorandom variates and a way to consistently select subsets of
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 8]
Internet Draft Techniques for IP Packet Selection June 2003
packets that share a common property. In the following subsections we
give more details about them.
However, both usages require that the hash functions has two
statistical properties. First, the hash function h must have good
mixing properties, in the sense that small changes in the input (e.g.
the flipping of a single bit) cause large changes in the output (many
bits change). Then any local clump of values of c is spread widely
over R by h, and so the distribution of h(c) is fairly uniform even
if the distribution of c is not. Then the sampling rate is #S/#R,
which can be tuned by choice of S. If S and R are sets contiguous
integers, h(c), suitably shifted and normalized, can be interpreted
as a pseudorandom variate. The second desirable property depends more
closely on the statistics of the content c. In applications, the
content c comprises a number of distinct fields, c1 ... cm, e.g.
source and destination IP Address, IP identification, and TCP/UDP
port numbers (if present) for a packet. With a hash function
satisfying the first properties above, selection decisions will
appear uncorrelated with the contents of any individual field, if the
complementary fields are (i) sufficiently variable themselves, and
(ii) sufficiently uncorrelated with cj.
4.2.1 Random sampling emulation
Although pseudorandom number generators with well understood
properties have been developed, they may not be the method of choice
in setting where computational resources are scarce. A convenient
alternative is to use hash functions of packet content as a source of
randomness. The hash (suitably renormalized) is a pseudorandom
variate in the interval [0,1]. Other schemes may use packet fields in
iterators for pseudorandom numbers.
The point here, is that the statistical properties of the idealized
packet selection law (such as independence of sampling decisions for
different packets, or independence on packet content) may not be
exactly shared by an implementation, but only approximately so.
Although the selection decisions for non-uniform independent random
sampling (see Section 3.1.2.2.2 above) also depend on the packet
content, this form of sampling is distinguished from the use of
packet content to generate variates. In the former case, the content
only determines the selection probabilities: selection could then
proceed e.g by use of a variates obtained by an independent
pseudorandom number generator. In the latter case, the content
determines the pseudorandom variates rather than the probabilities.
4.2.2 Consistent packet selection and its applications
Consistent packet selection is also often referred as trajectory
sampling.
In trajectory sampling, all routers in a network hash portion of the
packet content using identical hash function and selection range. The
domain of the hash is restricted to those parts of the packet that
are invariant from hop to hop. Fields such as Time-to-Live, which is
decremented per hop, and header CRC, which is recalculated per hop,
are thus excluded from the hash domain. Thus, a given packet is
selected at all either all points on its path through the network, or
at none. The domain of the hash function needs to be wider than just
a flow key, if packets are to be selected quasi-randomly within flows
(and e.g. include portions of the payload), see [DuGr00]. If a report
on each selected packet is exported to a collector, the latter can
reconstruct trajectories of the selected packets, provided it can
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 9]
Internet Draft Techniques for IP Packet Selection June 2003
match different reports on the same packet, and distinguish these
from reports on different packets. For this purpose, reports may also
contain a second distinct hash (identification hash) of the selected
packets. The identification hash can be considered as part of the
selection state.
Applications of trajectory sampling include (i) estimation of the
network path matrix, i.e., the traffic intensities according to
network path, broken down by flow; (ii) detection of routing loops,
as indicated by self-intersecting trajectories; (iii) passive
performance measurement: prematurely terminating trajectories
indicate packet loss, and packet latencies can be determined if
reports include (synchronized) timestamps; (iv) network attack
tracing, of the actual paths taken by attack packets with spoofed
source addresses.
4.2.3 Guarding Against Pitfalls and Vulnerabilities
A concern is whether some large set of related packets could be
sampled at a rate that significantly differs from the expected
sampling rate, either (i) through unanticipated behavior in the hash
function, or (ii) because the packets had been deliberately crafted
to have this property.
The first point underlines the importance of using a hash function
with good mixing properties. Examples of such are CRC32 and hash
functions based on modular arithmetic, see 6.4 in [Knuth98]. The
statistical properties of candidate hash functions need to be
evaluated, preferably on packet before adoption for hash-based
sampling
Hash sampling could be overloaded (or evaded) by an attacker if the
hash function and the selection rate are both known. A service
provider could build a first defense keeping S private. Then, an
attacker could not determine whether a crafted packet is select, but
he would still know that a crafted a set of packets all with the same
hash is either all selected or all not selected. Moreover, when
applications (like multi domain trajectory sampling, or One way delay
estimation across multiple domains) may require multiple
administrative entities to agree on a common hash function and
selection range, mutual trust between the entities cannot be avoided
and just keeping S secret may not be feasible. A stronger defense is
to employ a parametrizable hash function and keep the parameter
private: in this case, the set of hash values of the packets could
not be determined. Examples of parameters are the initial vector in
CRC32, and moduli in hashes based on modular arithmetic.
4.3 Router State filtering
This class of filters select a packet on the basis of the following
conditions), possibly combined with the AND, OR or NOT operators.
- Ingress interface at which packet arrives equals a specified
value
- Egress interface to which packet is routed to equals a
specified value
- Packet violated acl on the router
- Failed rpf
- Failed rsvp
- No route found for the packet
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 10]
Internet Draft Techniques for IP Packet Selection June 2003
- Origin AS equals a specified value or lies within a given
range
- Destination AS equals a specified value or lies within a given
range
Router architectural considerations may preclude some information
concerning the packet treatment, e.g routing state, being available
at line rate for selection of packets. However, if selection not
based on routing state has reduced down from line rate, subselection
based on routing state may be feasible.
5. Input Parameters and Information Models
The decision whether to select a packet or not is based on a function
which is performed when the packet arrives at the selection process.
Packet selection schemes differ in the input parameters for the
selection process and the functions they require to do the packet
selection. The following table gives an overview.
Scheme | input parameters | functions
---------------+------------------------+---------------------
systematic | packet position | packet counter
count-based | sampling pattern |
---------------+------------------------+---------------------
systematic | arrival time | clock or timer
time-based | sampling pattern |
---------------+------------------------+---------------------
random | packet position | packet counter,
n-out-of-N | sampling pattern | random numbers
| (random number list) |
---------------+------------------------+---------------------
uniform | sampling | random function
probabilistic | probability |
---------------+------------------------+---------------------
non-uniform |e.g. packet position | selection function,
probabilistic |or packet content(parts)| probability calc.
---------------+------------------------+---------------------
non-uniform |e.g. flow state | selection function,
flow-state |or packet content(parts)| probability calc.
---------------+------------------------+---------------------
mask/match | packet content(parts) | filter function
---------------+------------------------+---------------------
hash-based | packet content(parts) | hash function
---------------+------------------------+---------------------
router state | router state | router state
| | discovery
---------------+------------------------+---------------------
5.1 Information Model for Sampling Techniques
In this section we define the information models for most common
sampling techniques. Here the selection function is pre-defined and
given by the selector ID.
Sampler Description:
SELECTOR_ID
SELECTOR_TYPE
SELECTOR_PARAMETERS
OPERATING_TIME
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 11]
Internet Draft Techniques for IP Packet Selection June 2003
ASSOCIATIONS
Where:
SELECTOR_ID:
Unique ID for the packet sampler. The ID can be calculated under
consideration of the ASSOCIATIONS and a local ID.
SELECTOR_TYPE
Description: For sampling processes the SELECTOR TYPE defines what
sampling algorithm is used.
Values: n out of N | Systematic Time Based (equally spaced)|
Systematic Position Based (equally spaced)| Probabilistic | flow
state
[Remark: further sampling schemes will be added here]
SELECTOR_PARAMETERS
Description: For sampling processes the SELECTOR PARAMETERS define
the input parameters for the process. Interval length in systematic
sampling means, that all packets that arrive in this interval are
selected. The spacing parameter defines the spacing in time or number
of packets between the end of one sampling interval and the start of
the next succeeding interval.
Case n out of N:
- List of n sampling positions in an array of N positions
Case Systematic Time Based:
- Interval length (in usec), Spacing (in usec)
Case Systematic Count Based:
- Interval length(in packets), Spacing (in packets)
Case uniform Probabilistic(with equal probability per packet):
- Sampling probability p
Case non-uniform probabilistic:
- Calculation function for sampling probability p
Case flow state:
- Policy for selecting flows (e.g. give priority to large flows)
OPERATING_TIME
Description: The OPERATING_TIME parameter describes the start/stop
time of sampling process. List elements must not overlap. The start
time of the first element can be omitted, meaning “from now”. The end
time of the last element can be omitted, meaning “until sampler is
removed”.
Values: List of (Start time, End time)
ASSOCIATIONS
Description: The ASSOCIATIONS field describes the observation point
and the IPFIX processes to which the packet selector is associated.
The STREAM ID denotes the origin of the data stream that is input to
the selection function. It can be the observation point directly or
the ID of another selector. With this it is possible to define
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 12]
Internet Draft Techniques for IP Packet Selection June 2003
combined schemes. If the STREAM ID contains IDs from other selectors,
one can derive the original observation point from the selector
definitions of these specified selectors.
Values: <STREAM ID, Metering process ID, Exporting process ID>
With STREAM ID: Observation point ID | List of SELECTOR_IDs
5.2 Information Model for Filtering Techniques
In this section we define the information models for most common
filtering techniques. The information model structure closely
parallels the one presented for the sampling techniques.
Filter Description:
SELECTOR_ID
SELECTOR_TYPE
SELECTOR_PARAMETERS
OPERATING_TIME
ASSOCIATIONS
Where:
SELECTOR_ID:
Unique ID for the packet filter. The ID can be calculated under
consideration of the ASSOCIATIONS and a local ID.
SELECTOR_TYPE
Description: For filtering processes the SELECTOR TYPE defines what
filtering type is used.
Values: Matching | Hashing | Router_state
SELECTOR_PARAMETERS
Description: For filtering processes the SELECTOR PARAMETERS define
formally the common property of the packet being filtered. For the
filters of type Matching and Hashing the definitions have a lot of
points in common.
Values:
Case Matching
- <Header type = ip v4 >
- <bit specification, header part>
- <Selection interval specification, header part>
- <Header type = ipv6>
- <bit specification, header part>
- <Selection interval specification, header part>
- <payload byte number N>
- <bit specification, payload part>
- <Selection interval specification, payload part>
Notes to Case Matching:
- The filter can be defined for the header part only, for the
payload part only or for both. In the latter case the matching
must be an AND of the two.
- The bit specification, for the header part, can be specified
for ipv4 or ipv6 only, or both
- In case of ipv4, the bit specification is a sequence of 20
Hexadecimal numbers [00,FF] specifying a 20 bytes bitmask to be
applied to the header
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 13]
Internet Draft Techniques for IP Packet Selection June 2003
- In case of ipv6, it is a sequence of 40 Hexadecimal numbers
[00,FF] specifying a 40 bytes bitmask to be applied to the
header
- The bit specification, for the payload part, is a sequence of
Hexadecimal numbers [00,FF] specifying the bitmask to be
applied to the first N bytes of the payload, as specified by
the previous field. In case the Hexadecimal number sequence is
longer then N, only the first N numbers are considered.
- In case the payload is shorter than N, the packet will not
match the filter Other options, like padding with zeros, may be
considered in the future.
- The selection interval specification is a list of non
overlapping intervals [intv_begin, intv_end] where intv_begin,
intv_end are bit strings of length 20*8 (ipv4 case), 40*8 (ipv6
case), N*8 (payload case).
- A filter cannot be defined on the options field of the ipv4
header, neither on stacked headers of ipv6.
- This specification doesn’t preclude the future definition of a
high level syntax for defining in a concise way bit selection
and matching rules in a more human readable form (e.g. “TCP
port in [2000,3000]”). The requirement is that such a syntax
can be univoquely compiled into the one defined above
Case Hashing:
- <Header type = ipv4>
- <Input bit specification, header part>
- <Header type = ipv6>
- <Input bit specification, header part>
- <payload byte number N>
- <Input bit specification, payload part>
- <additional hash input bits (seed)>
- Hashing function specification (includes length of hash
function output M)
- Selection interval specification, as a list of non overlapping
intervals [start value, end value] where value is in [0,2^M-1]
Notes to Case Hashing:
- On Input bit specifications fields, the same notes on bit
specifications of the Matching case reported above apply
Case Router State:
- Ingress interface at which packet arrives equals a specified
value
- Egress interface to which packet is routed to equals a
specified value
- Packet violated acl on the router
- Failed rpf
- Failed rsvp
- No route found for the packet
- Origin AS equals a specified value or lies within a given
range
- Destination AS equals a specified value or lies within a given
range
Note to Case Router State:
- All Router state entries can be linked by AND, OR, NOT
operators
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 14]
Internet Draft Techniques for IP Packet Selection June 2003
OPERATING_TIME
Description: The OPERATING_TIME parameter describes the start/stop
time of filtering process. List elements must not overlap. The start
time of the first element can be omitted, meaning “from now”. The end
time of the last element can be omitted, meaning “until sampler is
removed”.
Values: List of (Start time, End time)
ASSOCIATIONS
Description: The ASSOCIATIONS field describes the observation point
and the IPFIX processes to which the packet selector is associated.
The STREAM ID denotes the origin of the data stream that is input to
the selection function. It can be the observation point directly or
the ID of another selector. With this it is possible to define
combined schemes. If the STREAM ID contains IDs from other selectors,
one can derive the original observation point from the selector
definitions of these specified selectors.
Values: STREAM ID, Metering process ID, Exporting process ID>
With STREAM ID: Observation point ID | List of SELECTOR_IDs
6. Composite Techniques
Composite schemes are realized by using the STREAM ID in the
information models. The STREAM ID denotes from which selectors the
input stream originates. If multiple stream IDs are given, this means
that the selector operates on the packet stream simply resulting from
the time superposition of the output of all the listed filters and
samplers. Note that a sampler/filter could be intermittently active,
as defined in the OPERATING TIME field.
Some examples of composite schemes are reported below.
6.1 Cascaded filtering->sampling or sampling->filtering
If a filter precedes a sampling process the role of filtering is to
create a set of “parent populations” from a single stream that can
then be fed independently to different sampling functions, with
different parameters tuned for the population itself (e.g. if streams
of different intensity result from filtering, it may be good to have
different sampling rates). If filtering follows a sampling process,
the same sampling rate and type is applied to the whole stream,
independently of the relative size of the streams resulting from the
filtering function. Moreover, also packets not destined to be
selected will “load” the sampling function. So, in principle,
filtering before sampling allows a more accurate tuning of the
sampling procedure, but if filters are too complex to work at full
line rate (e.g. because they have to access router state
information), sampling before filtering may be a need.
6.2 Stratified Sampling
Stratified sampling is one example for using a composite technique.
The basic idea behind stratified sampling is to increase the
estimation accuracy by using a-priori information. The a-priori
information is used to perform an intelligent grouping of the
elements of the parent population. With this a higher estimation
accuracy can be achieved with the same sample size.
Stratified sampling divides the sampling process into multiple steps.
First, the elements of the parent population are grouped into subsets
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 15]
Internet Draft Techniques for IP Packet Selection June 2003
in accordance to a given characteristic. This grouping can be done in
multiple steps. Then samples are taken from each subset.
The stronger the correlation between the characteristic used to
divide the parent population and the characteristic of interest (for
which an estimate is sought after), the easier is the consecutive
sampling process and the higher is the stratification gain. For
instance if the dividing characteristic were equal to the
investigated characteristic, each element of the sub-group would be a
perfect representative of that characteristic. In this case it would
be sufficient to take one arbitrary element out of each subgroup to
get the actual distribution of the characteristic in the parent
population. Therefore stratified sampling can reduce the costs for
the sampling process (i.e. the number of samples needed to achieve a
given level of confidence).
For stratified sampling one has to specify classification rules for
grouping the elements into subgroups and the sampling scheme that is
used within the subgroups. The classification rules can be expressed
by multiple filters. For the sampling scheme within the subgroups the
parameters have to be specified as described above.
7. Security Considerations
Security threats can occur if the configuration of sampling
parameters or the communication of sampling parameters to the
application is corrupted. This document only describes sampling
schemes that can be used for packet selection. It neither describes a
mechanism how those parameters are configured nor how these
parameters are communicated to the application. Therefore the
security threats that originate from this kind of communication
cannot be assessed with the information given in this document.
In some cases malicious users or attackers may be interested to hide
packets from the service provider. For instance if packet selectors
are used for accounting or intrusion detection applications, users
may want to prevent that packets are selected. If a deterministic
sampling scheme is used or a selection scheme that takes packet
content into account, the user can shape or send packets in a way
that they are less likely to be selected. This has to be taken into
account when choosing an appropriate packet selection technique.
8. References
[AmCa89] Paul D. Amer, Lillian N. Cassel: Management of Sampled
Real-Time Network Measurements, 14th Conference on Local
Computer Networks, October 1989, Minneapolis, pages 62-
68, IEEE, 1989
[ClPB93] K.C. Claffy, George C Polyzos, Hans-Werner Braun:
Application of Sampling Methodologies to Network Traffic
Characterization, Proceedings of ACM SIGCOMM'93, San
Francisco, CA, USA, September 13 - 17, 1993
[CoGi98] I. Cozzani, S. Giordano: Traffic Sampling Methods for
end-to-end QoS Evaluation in Large Heterogeneous
Networks. Computer Networks and ISDN Systems, 30 (16-
18), September 1998.
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 16]
Internet Draft Techniques for IP Packet Selection June 2003
[DuGG03] Nick Duffield, Albert Greenberg, Matthias Grossglauser,
Jennifer Rexford: A Framework for Passive Packet
Measurement, Internet Draft draft-ietf-psamp-framework-
01, work in progress, June 2003
[DuGr00] Nick Duffield, Matthias Grossglauser: Trajectory
Sampling for Direct Traffic Observation, Proceedings of
ACM SIGCOMM 2000, Stockholm, Sweden, August 28 -
September 1, 2000.
[EsVa01] C. Estan and G. Varghese, “New Directions in Traffic
Measurement and Accounting”, ACM SIGCOMM Internet
Measurement Workshop 2001, San Francisco (CA) Nov. 2001
[HT52] D.G. Horvitz and D.J. Thompson, A Generalization of
Sampling without replacement from a Finite Universe.
J. Amer. Statist. Assoc. Vol. 47, pp. 663-685, 1952.
[JePP92] Jonathan Jedwab, Peter Phaal, Bob Pinna: Traffic
Estimation for the Largest Sources on a Network, Using
Packet Sampling with Limited Storage, HP technical
report, Managemenr, Mathematics and Security Department,
HP Laboratories, Bristol, March 1992,
http://www.hpl.hp.com/techreports/92/HPL-92-35.html
[Knuth98] Donald E. Knuth: The Art of Computer Programming, Volume
3: Searching and Sorting, Addison Wesley, 1998
[QuZC03] J. Quittek, T. Zseby, B. Claise, S. Zander: Requirements
for IP Flow Information Export, Internet Draft <draft-
ietf-ipfix-reqs-10.txt>, work in progress, June 2002
[Zseb02] Tanja Zseby: Deployment of Sampling Methods for SLA
Validation with Non-Intrusive Measurements, Proceedings
of Passive and Active Measurement Workshop (PAM 2002),
Fort Collins, CO, USA, March 25-26, 2002
9. Author's Addresses
Tanja Zseby
Fraunhofer Institute for Open Communication Systems
Kaiserin-Augusta-Allee 31
10589 Berlin
Germany
Phone: +49-30-34 63 7153
Fax: +49-30-34 53 8153
Email: zseby@fokus.fhg.de
Maurizio Molina
NEC Europe Ltd., Network Laboratories
Adenauerplatz 6
69115 Heidelberg
Germany
Phone: +49 6221 90511-18
Email: molina@ccrle.nec.de
Fredric Raspall
NEC Europe Ltd., Network Laboratories
Adenauerplatz 6
69115 Heidelberg
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 17]
Internet Draft Techniques for IP Packet Selection June 2003
Germany
Phone: +49 6221 90511-31
EMail: raspall@ccrle.nec.de
Nick Duffield
AT&T Labs - Research
Room B-139
180 Park Ave
Florham Park NJ 07932, USA
Phone: +1 973-360-8726
Email: duffield@research.att.com
10.
Intellectual Property Statement
AT&T Corporation may own intellectual property applicable to this
contribution. The IETF has been notified of AT&T's licensing intent
for the specification contained in this document. See
http://www.ietf.org/ietf/IPR/ATT-GENERAL.txt for AT&T's IPR
statement.
11.
Full Copyright Statement
Copyright (C) The Internet Society (2002). All Rights Reserved. This
document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Zseby, Molina, Raspall, Duffield Expires December 2003 [Page 18]