[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

new sampling draft



[ post by non-subscriber. with the massive amount of spam, it is easy to miss
and therefore delete posts by non-subscribers. if you wish to regularly
post from an address that is not subscribed to this mailing list, send a
message to <listname>-owner@ops.ietf.org and ask to have the alternate
address added to the list of addresses from which submissions are
automatically accepted. ]

Hi all,

I just submitted a new version of the psamp sampling draft (attached). Main changes are:
- some text on hash-based sampling from Nick included
- introduction of complexity levels

There are still some inconsistencies in terminology and categorization. Nick and I plan to discuss this at the IETF.

Regards
Tanja

--
Dipl.-Ing. Tanja Zseby
Fraunhofer Institute FOKUS Email: zseby@fokus.fraunhofer.de
Kaiserin-Augusta-Allee 31 Phone: +49-30-3463-7153
D-10589 Berlin, Germany Fax: +49-30-3463-8153
-------------------------------------------------------------------------------------- "Living on earth is expensive but it includes a free trip around the sun." (Anonymous)
--------------------------------------------------------------------------------------

                
                                                                                       
               Internet Draft                                                          
               Document: <draft-ietf-psamp-sample-tech-01.txt>                T. Zseby 
               Expires: September 2003                                Fraunhofer FOKUS 
                                                                             M. Molina 
                                                                       NEC Europe Ltd. 
                                                                            F. Raspall 
                                                                       NEC Europe Ltd. 
                                                                                       
                                                                            March 2003 
                
                
                  Sampling and Filtering Techniques for IP Packet Selection 
                
                
               Status of this Memo 
                
                  This document is an Internet-Draft and is in full conformance with 
                  all provisions of Section 10 of RFC2026.  
                   
                  Internet-Drafts are working documents of the Internet Engineering 
                  Task Force (IETF), its areas, and its working groups. Note that 
                  other groups may also distribute working documents as Internet-
                  Drafts. Internet-Drafts are draft documents valid for a maximum of 
                  six months and may be updated, replaced, or obsoleted by other 
                  documents at any time. It is inappropriate to use Internet-Drafts as 
                  reference material or to cite them other than as "work in progress."  
                   
                  The list of current Internet-Drafts can be accessed at 
                  http://www.ietf.org/ietf/1id-abstracts.txt  
                   
                  The list of Internet-Draft Shadow Directories can be accessed at 
                  http://www.ietf.org/shadow.html. 
                   
               Abstract 
                   
                  This document describes sampling and filtering techniques for IP 
                  packet selection. It introduces information models for packet 
                  sampling, for packet filtering and for combinations of methods. The 
                  information models describe what information has to be specified in 
                  order to describe the method. This information is used for 
                  configuring the selection technique in measurement processes and for 
                  reporting the technique in use to the measurement data collection 
                  process.   
                  The document first suggests some terminology, then it describes in 
                  detail packet sampling and packet filtering techniques and their 
                  parameters. It also describes how these two techniques can be 
                  combined to build more elaborate packet selectors. Finally, it 
                  introduces information models for the most common sampling and 
                  filtering techniques. 
                   
                   
                   
                
               Zseby, Molina, Raspall        Expires April 2003              [Page 1] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
               Table of Contents 
                
                  1.   Introduction.................................................2 
                  2.   Terminology..................................................3 
                  3.   Scope and Deployment of Packet Selection Techniques..........4 
                  3.1  Sampling.....................................................5 
                  3.2  Filtering....................................................6 
                  3.3  Hash-based Sampling..........................................6 
                  3.3.1 Statistical Properties......................................6 
                  3.3.2 Example: Trajectory Sampling................................7 
                  3.3.3 Guarding Against Pitfalls and Vulnerabilities...............7 
                  4.   Sampling Methods.............................................8 
                  4.1  Sampling Algorithm...........................................8 
                  4.1.1 Systematic Sampling.........................................8 
                  4.1.2 Random Sampling.............................................8 
                  5.   Sampling Parameters..........................................9 
                  5.1  Parameters for systematic sampling...........................9 
                  5.2  Parameters for random sampling...............................9 
                  6.   Complexity Levels...........................................10 
                  7.   Information Model Sampling Techniques.......................11 
                  8.   Filtering...................................................12 
                  8.1  Filtering operating directly on some of the packet’s bits...13 
                  8.2  Filtering considering router reaction or router state.......13 
                  9.   Information Model for Filtering Techniques..................13 
                  10.  Composite Techniques........................................16 
                  10.1 Cascaded filtering->sampling or sampling->filtering.........16 
                  10.2 Stratified Sampling.........................................16 
                  11.  Security Considerations.....................................17 
                  12.  Acknowledgements............................................17 
                  13.  References..................................................18 
                  14.  Author's Addresses..........................................18 
                  15.  Full Copyright Statement....................................19 
                
               1. Introduction 
                   
                  Increasing data rates and growing measurement demands increase the 
                  requirements for data collection resources. For measurement 
                  scenarios in backbone networks it is often required to measure whole 
                  traffic aggregates instead of single flows. Furthermore, some 
                  measurement methods require the capturing of packet headers or even 
                  parts of the payload. All this can lead to an overwhelming amount of 
                  measurement data, resulting in high demands regarding resources for 
                  metering, storage, transport and post processing.  
                
                  In some cases specialized hardware helps to fulfill these demands 
                  but on the other hand increases the costs for providing the 
                  measurement. Since measurements are mainly a supporting 
                  functionality for the service provisioning, measurement costs 
                  usually should be limited to a small fraction of the costs of the 
                  network service provisioning itself. Therefore a reduction of the 
                  measurement result data is crucial to prevent the depletion of the 
                  available (i.e. the affordable) resources.  Such a reduction can be 
                  achieved by a reasonable deployment of packet selection techniques, 
                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 2] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                  that sample a subset of the packets while still allowing an 
                  appropriate accuracy, or filter out all packets that are not of 
                  interest for the measurement at all. Packet selection helps to 
                  prevent an exhaustion of resources and to limit the measurement 
                  costs. Examples for applications that benefit from packet selection 
                  are given in [DuGG02]. 
                
               2. Terminology 
                
                  IP Packet Selection Process 
                       An IP packet selection process takes IP packets or parts of IP 
                       packets (e.g. header) as input and extracts a subset of these 
                       packets by applying a selection function.  
                        
                  Filtering 
                       Filtering selects a subset of packets by applying deterministic 
                       functions on parts of the packet content like header fields or 
                       parts of the payload. A filtering process needs to process the 
                       packet (look at packet header and/or payload) in order to make 
                       the selection decision. 
                
                  Sampling 
                       Sampling selects a subset of packets by applying deterministic 
                       or random functions on the (temporal or spatial) packet 
                       position or by performing (pseudo) random calculations per 
                       packet. This can be for example selecting every nth packet 
                       (deterministic function on packet position) or selecting a 
                       packet that arrives at the metering process in accordance to 
                       the output of a random function (like flipping a coin per 
                       packet). Sampling does not work on packet content. That means, 
                       in contrast to filtering, a sampling process does not need to 
                       process the packet in order to make the selection decision. 
                   
                  Hash function 
                       The computation of an M bit string starting from an N bit 
                       string. In this context, the N starting bits are some of the 
                       bits of a packet header and/or payload. 
                        
                  Hash selection range 
                       A subset of the M bit computed with a hash function for which 
                       an Indicator Function has a value of 1. 
                
                  Stream 
                       A packet stream is the sequence of packets used as input for a 
                       packet selection process. If multiple packet selectors are 
                       applied subsequently, the output stream of one selector forms 
                       the input stream for the succeeding selector. If the first 
                       selector was a sampling process, the packets in the stream 
                       usually do not have common properties by which they can be 
                       distinguished from packets that not have been selected. 
                       Therefore we define here the term stream instead of flow, which 
                       is defined as set of packets with common properties [QuZC02]. 

                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 3] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                       If the term flow is used throughout the text, the flow 
                       definition in [QuZC02] applies. 
                   
                  Metering process  
                       see definition in [QuZC02] 
                
                  Sample size 
                       The sample size denotes the number of packets in the sample. 
                   
                  Selection function 
                       Function that determines whether an IP packet is selected or 
                       not. 
                        
                  Sampling probability  
                       The probability with which one element is selected as part of 
                       the sample.  
                        
                  Sampling ratio 
                       The ratio between the sample size and the number of packets 
                       composing the input stream of a packet sampling process. 
                
               3. Scope and Deployment of Packet Selection Techniques 
                   
                  The selection technique used to select a subset of packets out of 
                  all those crossing an observation point depends on the purpose 
                  (application) for which measurement is performed. If the main 
                  purpose of an application is to infer some characteristic of the 
                  whole set of crossing packets without processing them all (thus 
                  reducing the computation load) then we call the used selection 
                  technique “sampling”. In principle, with sampling the content of the 
                  packet is not relevant for the packet selection: what matters is 
                  only that the selected sample has a distribution of the 
                  characteristic to infer similar to the one of the parent population, 
                  so that it can be estimated reliably. The sampling decision may be 
                  based on the temporal or spatial position of the packet in the 
                  packet stream, or may depend on a (pseudo) random number extraction 
                  or calculation. 
                   
                  On the contrary, if the application needs to consider all the 
                  packets having some common property, then we call the selection 
                  technique “filtering”. The property can be directly derived by some 
                  computation on the packet content, or depend on the treatment given 
                  by the router to the packet. We conclude that sampling does not 
                  consider packet content, and can depend on packet position or on 
                  (pseudo) random decisions, while filtering depends on packet 
                  content, but never depends on packet position or on (pseudo) random 
                  decisions. 
                   
                  Note that a common technique to select packets is to compute a hash 
                  function on some bits of the packet header and/or content and to 
                  select it if the result falls in a certain selection range. Since 
                  hashing is a deterministic operation, it is a powerful mean to 
                  ensure that the same packets are selected at multiple measurement 
                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 4] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                  points. Depending on the chosen input bits, on the hash function and 
                  on the selection range, this technique could also be used to emulate 
                  the random selection of packets with a given probability p. Hashing 
                  could be viewed as a particular type of filtering, but due to its 
                  peculiarities we prefer to describe it as a separate  packet 
                  selection technique. 
                   
                  The introduced classification is mainly useful for the definition of 
                  an information model describing “primitive” selection techniques. 
                  More complex selection techniques may then be described through the 
                  composition of cascaded sampling and filtering operations. 
                  For example, a packet selection that weights the selection 
                  probability on the basis of the packet length can be described as a 
                  set of filter/sampling cascades. However, this descriptive approach 
                  is not intended to be rigid: if a common and consolidated selection 
                  practice turns out to be too complex to be described as a 
                  composition of the mentioned building blocks, an ad hoc description 
                  can be specified instead. 
                   
                  We consider packet selectors as part of an IPFIX metering process 
                  which also can use the IPFIX exporting process. This is expressed as 
                  association to one or more IPFIX processes. 
                   
               3.1 Sampling 
                
                  The deployment of sampling techniques aims at the provisioning of 
                  information about a specific characteristic of the parent population 
                  at a lower cost than a full census would demand. In order to plan a 
                  suitable sampling strategy it is therefore crucial to determine the 
                  needed type of information and the desired degree of accuracy in 
                  advance. 
                   
                  First of all it is important to know the type of metric that should 
                  be estimated. The metric of interest can range from simple packet 
                  counts [JePP92] up to the estimation of whole distributions of flow 
                  characteristics (e.g. packet sizes)[ClPB93].  
                
                  Secondly, the required accuracy of the information and with this, 
                  the confidence that is aimed at, should be known in advance. For 
                  instance for usage-based accounting the required confidence for the 
                  estimation of packet counters can depend on the monetary value that 
                  corresponds to the transfer of one packet. That means that a higher 
                  confidence could be required for expensive packet flows (e.g. 
                  premium IP service) than for cheaper flows (e.g. best effort). The 
                  accuracy requirements for validating a previously agreed quality can 
                  also vary extremely with the customer demands. These requirements 
                  are usually determined by the service level agreement (SLA). 
                
                  Sampling is considered as part of the metering process. A metering 
                  process consists of multiple functions (capturing, time stamping, 
                  etc.). Sampling can be applied at different functions of the 
                  metering process. In the following we consider a measured IP packet 

                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 5] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                  with its observation point and timestamp as basis elements of the 
                  parent population. 
                   
                  The sampling method and the parameters in use must be clearly 
                  communicated to all applications that use the measurement data. Only 
                  with this knowledge a correct interpretation of the measurement 
                  results can be ensured. 
                   
               3.2 Filtering 
                
                  Packet filtering can be done for a wide variety of purposes e.g. for 
                  security, SLA enforcing, accounting. Depending on the type of 
                  filtering, it can be applied in different parts of the metering 
                  process. The role of filtering, as the word itself suggest, is to 
                  separate all the packets having a certain property from those not 
                  having it. A distinguishing characteristic from sampling is that the 
                  property never depends on the packet position in time or in the 
                  space, or on a random process. 
                   
                   
               3.3 Hash-based Sampling 
                   
                  Hash-based sampling offers both a way to emulate random sampling by 
                  using packet content to generate pseudorandom variates and a way to 
                  consistently select subsets of packets that share a common property. 
                   
                  A hash function h that maps the packet content c, or some portion of 
                  it, onto a range R. The packet is selected if h(c) is element of the 
                  S which is a subset of R called the selection range. Thus hash-based 
                  sampling is indeed a particular case of filtering: the object is 
                  selected if c is in inv(h(S)).  For desirable hash functions the 
                  inverse image inv(h(S)) will be extremely complex, and hence h would 
                  not be expressible as, say, a match/mask filter or a simple 
                  combination of these. 
                   
               3.3.1   Statistical Properties 
                   
                  For good pseudorandom sampling two properties are required. First, 
                  the hash function h must have good mixing properties, in the sense 
                  that small changes in the input (e.g. the flipping of a single bit) 
                  cause large changes in the output (many bits change). Then any local 
                  clump of values of c is spread widely over R by h, and so the 
                  distribution of h(c) is fairly uniform even if the distribution of c 
                  is not. Then the sampling rate is #S/#R, which can be tuned by 
                  choice of S. If S and R are sets contiguous integers, h(c), suitably 
                  shifted and normalized, can be interpreted as a pseudorandom 
                  variate. 
                   
                  The second desirable property depends more closely on the statistics 
                  of the content c. In applications, the content c comprises a number 
                  of distinct fields, c1 ... cm, e.g. source and destination IP 
                  Address, IP identification, and TCP/UDP port numbers (if present) 
                  for a packet. With a hash function satisfying the first properties 
                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 6] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                  above, selection decisions will appear uncorrelated with the 
                  contents of any individual field, if the complementary fields are 
                  (i) sufficiently variable themselves, and (ii) sufficiently 
                  uncorrelated with cj.  
                   
                   
               3.3.2   Example: Trajectory Sampling 
                
                  In trajectory sampling, all routers in a network hash-sample packets 
                  using identical hash function and selection range. The domain of the 
                  hash is restricted to those fields that are invariant from hop to 
                  hop. Fields such as Time-to-Live, which is decremented per hop, and 
                  header CRC, which is recalculated per hop, are thus excluded from 
                  the hash domain. Thus a given packet is selected at all either all 
                  points on its path through the network, or at none. The domain of 
                  the hash function needs to be wider than just a flow key, if packets 
                  are to be selected quasirandomly within flows (and e.g. include 
                  portions of the payload); see [DuGr00]. A report on each selected 
                  packet is exported to a collector. The collector can reconstruct 
                  trajectories of the selected packets provided it can match different 
                  reports on the same packet, and distinguish these from reports on 
                  different packets. For this purpose, reports may also contain a 
                  second distinct hash of the selected packets and/or timing 
                  information.  
                  Applications of trajectory sampling include (i) estimation of the 
                  network path matrix, i.e., the traffic intensities accordng to 
                  network path, broken down by flow; (ii) detection of routing loops, 
                  as indicated by self-intersecting trajectories; (iii) passive 
                  performance measurement: prematurely terminating trajectories 
                  indicate packet loss, and packet latencies can be determined if 
                  reports include (synchronized) timestamps; (iv) network attack 
                  tracing, of the actual paths taken by attack packets with spoofed 
                  source addresses. 
                
               3.3.3   Guarding Against Pitfalls and Vulnerabilities 
                   
                  A concern is whether some large set of related packets could be 
                  sampled at a rate that significantly differs from the expected 
                  sampling rate, either (i) through unanticipated behavior in the hash 
                  function, or (ii) because the packets had been deliberately crafted 
                  to have this property. 
                  The first point underlines the importance of using a hash function 
                  with good mixing properties. Examples of such are CRC32 and hash 
                  functions based on modular arithmetic, see 6.4 in [Knuth98]. The 
                  statistical properties of candidate hash functions need to be 
                  evaluated, preferably on packet before adoption for hash-based 
                  sampling. 
                   
                  Can hash sampling be overloaded (or evaded) if the hash function is 
                  known? Assume an attacker, knowing h and the selection range S can 
                  construct packets that will be sampled (or not sampled). If a 
                  service provider keeps S private, the attacker cannot determine 
                  whether a crafted packet will be selected. However, an attacker that 
                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 7] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                  crafted a set of packets all with the same hash would know that the 
                  packets would be either all selected or all not selected. A stronger 
                  defense is to employ a parametrizable hash function and keep the 
                  parameter private: in this case the set of hash values of the 
                  packets could not be determined. Examples of parameters are the 
                  initial vector in CRC32, and moduli in hashes based on modular 
                  arithmetic. Another defense would be to keep the selection range 
                  private. However, when applications (like multi domain trajectory 
                  sampling, or One way delay estimation across multiple domains) may 
                  require multiple administrative entities to agree on a common hash 
                  function and selection range, mutual trust between the entities 
                  cannot be avoided.  
                
               4. Sampling Methods 
                
                  Sampling Methods can be characterized by the sampling algorithm, the 
                  trigger type used for starting a sampling interval and the length of 
                  the sampling interval. These parameters are described here in 
                  detail. 
                
               4.1 Sampling Algorithm 
                   
                  The sampling algorithm describes the basic process for selection of 
                  samples. In accordance to [AmCa89] and [ClPB93] we define the 
                  following basic sampling processes: 
                   
               4.1.1   Systematic Sampling 
                
                  Systematic sampling describes the process of selecting the starting 
                  points and the duration of the selection intervals according to a 
                  deterministic function. This can be for instance the periodic 
                  selection of every n-th element of a trace but also the selection of 
                  all packets that arrive at pre-defined points in time. Even if the 
                  selection process does not follow a periodic function (e.g. if the 
                  time between the sampling intervals varies over time) we consider 
                  this as systematic sampling as long as the selection is 
                  deterministic. The use of systematic sampling always involves the 
                  risk of biasing the results. If the systematics in the sampling 
                  process resembles systematics in the observed stochastic process 
                  (occurrence of the characteristic of interest in the network), there 
                  is a high probability that the estimation will be biased. 
                  Systematics (e.g. periodic repetition of an event) in the observed 
                  process might not be known of in advance.  
                   
               4.1.2   Random Sampling 
                
                  Random sampling selects the starting points of the sampling 
                  intervals in accordance to a random process. The selection of 
                  elements are independent experiments. With this, unbiased 
                  estimations can be achieved. In contrast to systematic sampling, 
                  random sampling requires the generation of random numbers. One can 
                  differentiate two methods of random sampling: 
                   
                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 8] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                  n-out-of-N sampling 
                
                  In n-out-of-N sampling n elements are selected out of the parent 
                  population that consists of N elements. One example would be to 
                  generate random numbers and select all packets which have a packet 
                  position equal to one of the random numbers. For this kind of 
                  sampling the sample size is fixed.  
                   
                  Probabilistic sampling (see also [DuGG02]) 
                   
                  In probabilistic sampling the decision whether an element is 
                  selected or not is made in accordance to a pre-defined selection 
                  probability. An example would be to flip a coin for each packet and 
                  select all packets for which the coin showed the head. For this kind 
                  of sampling the sample size can vary for different trials. The 
                  selection probability is not necessarily the same for each packet. 
                
               5. Sampling Parameters 
                
                  The decision whether to select a packet or not is based on a 
                  function which is performed when the packet arrives at the sampling 
                  process. The sampling function can consist of a (pseudo) random 
                  calculation or of a function that take the packet position (temporal 
                  or spatial) into account. The parameters of these functions that are 
                  not derived from the packet are called sampling parameters.  
                
               5.1 Parameters for systematic sampling 
                   
                  For systematic sampling the deterministic function which is used for 
                  the packet selection needs to be given. For periodic sampling the 
                  start of the first selection interval, the length of the selection 
                  interval (given in number of packets or as time duration) and the 
                  spacing between selection intervals needs to be specified. 
                   
                                   <-- interval length = 7 --> <-- spacing = 5 _->  
                  Packet position: 1   2   3   4   5   6   7   8   9  10   11  12 13.. 
                   
                  The packets in the sample will be: 1,2,3,4,5,6,7, 13,...  
                   
                  Selecting every x-th packet would be a special case with selection 
                  interval=1 and spacing=x-1. 
                
               5.2 Parameters for random sampling 
                
                  For random n-out-of-N sampling only the sample size n needs to be 
                  specified. This can be done either as an absolute number or as 
                  fraction of the parent population n/N. 
                
                  For probabilistic sampling the selection probability p needs to be 
                  specified. If the selection probability depends on other parameters 
                  (e.g. packet content), the function that expresses this dependency 
                  has to be specified. 
                
                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 9] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
               6. Complexity Levels 
                   
                  Packet selection schemes differ in the input parameters for the 
                  selection process and the functions they require to do the packet 
                  selection. The following table gives an overview. 
                
                   
                       Scheme       |   input parameters     |     functions  
                     ---------------+------------------------+--------------------- 
                      simple        |        sampling        |  random function 
                      probabilistic |      probability       |    
                     ---------------+------------------------+--------------------- 
                      systematic    |    packet position     |  packet counter  
                      count-based   |    sampling pattern    |  
                     ---------------+------------------------+--------------------- 
                      systematic    |      arrival time      |  clock or timer 
                      time-based    |     sampling pattern   | 
                     ---------------+------------------------+--------------------- 
                        random      |  packet position       |  packet counter, 
                      n-out-of-N    |  sampling pattern      |  random numbers 
                                    | (random number list)   | 
                     ---------------+------------------------+--------------------- 
                      hash-based    |  packet content(parts) |  hash function 
                     ---------------+------------------------+--------------------- 
                      filtering     |  packet content(parts) |  filter function 
                     ---------------+------------------------+---------------------        
                      content-based |  packet content(parts) |  selection function, 
                      probabilistic |                        |  probability calc. 
                     ---------------+------------------------+--------------------- 
                      router state  |    router state        |   router state 
                                    |                        |   discovery 
                     ---------------+------------------------+---------------------       
                   
                
                  The sampling pattern determines which packets have to be selected in 
                  schemes that are not based on probabilistic sampling. For systematic 
                  count-based sampling this is the length of the sampling interval and 
                  the spacing between sampling intervals expressed in number of 
                  packets. For systematic time-based sampling this is the length of 
                  the sampling interval and the spacing between sampling intervals 
                  expressed as time intervals. For random n-out-of-N sampling this 
                  pattern is based e.g. on a list of random numbers. The parameters 
                  and function needed for combined schemes depend on the combination. 
                   
                  In content-based probabilistic sampling, the sampling probability 
                  depends on the content. This can be used to achieve a biased 
                  selection of packets.  
                
                  In order to allow different types of devices to implement schemes in 
                  accordance to their capabilities and available resources we group  
                  the schemes into the following complexity levels. 
                   

                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 10] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                  Complexity level 1: Devices that comply to PSAMP must at least 
                  support the following simple packets selection functions: 
                       - Simple probabilistic 
                       - systematic count-based 
                   
                  Complexity level 2: Devices that comply to PSAMP should support the 
                  following packets selection functions: 
                       - n-out-of-N 
                       - hash-based 
                       - filtering 
                       - content-based probabilistic  
                       - systematic time-based 
                   
                  Complexity level 3: Devices that comply to PSAMP may support the 
                  following packets selection functions: 
                       - router-state-based 
                       - combined schemes 
                
                   
               7. Information Model Sampling Techniques 
                   
                  In this section we define the information models for most common 
                  sampling techniques. Here the selection function is pre-defined and 
                  given by the selector ID.  
                   
                  Sampler Description: 
                       SELECTOR_ID 
                       SELECTOR_TYPE 
                       SELECTOR_PARAMETERS 
                       OPERATING_TIME 
                       ASSOCIATIONS 
                
                  Where: 
                   
                  SELECTOR_ID: 
                  Unique ID for the packet sampler. The ID can be calculated under 
                  consideration of the ASSOCIATIONS and a local ID. 
                
                   
                  SELECTOR_TYPE 
                  Description: For sampling processes the SELECTOR TYPE defines what 
                  sampling algorithm is used. 
                  Values: n out of N | Systematic Time Based (equally spaced)| 
                  Systematic Position Based (equally spaced)| Probabilistic  
                   
                  [Remark: further sampling schemes will be added here] 
                   
                  SELECTOR_PARAMETERS 
                  Description: For sampling processes the SELECTOR PARAMETERS define 
                  the input parameters for the process. Interval length in systematic 
                  sampling means, that all packets that arrive in this interval are 
                  selected. The spacing parameter defines the spacing in time or 

                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 11] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                  number of packets between the end of one sampling interval and the 
                  start of the next succeeding interval. 
                
                  Case n out of N: 
                     - List of n sampling positions in an array of N positions 
                   
                  Case Systematic Time Based: 
                     - Interval length (in usec), Spacing (in usec) 
                   
                  Case Systematic Position Based: 
                     - Interval length(in packets), Spacing (in packets) 
                   
                  Case Probabilistic(with equal probability per packet): 
                     - Sampling probability p 
                      
                  OPERATING_TIME 
                  Description: The OPERATING_TIME parameter describes the start/stop 
                  time of sampling process. List elements must not overlap. The start 
                  time of the first element can be omitted, meaning “from now”. The 
                  end time of the last element can be omitted, meaning “until sampler 
                  is removed”. 
                  Values: List of (Start time, End time) 
                   
                  ASSOCIATIONS 
                  Description: The ASSOCIATIONS field describes the observation point 
                  and the IPFIX processes to which the packet selector is associated.  
                  The STREAM ID denotes the origin of the data stream that is input to 
                  the selection function. It can be the observation point directly or 
                  the ID of another selector. With this it is possible to define 
                  combined schemes. If the STREAM ID contains IDs from other 
                  selectors, one can derive the original observation point from the 
                  selector definitions of these specified selectors. 
                   
                  Values: <STREAM ID, Metering process ID, Exporting process ID> 
                  With STREAM ID: Observation point ID | List of SELECTOR_IDs 
                   
                   
               8. Filtering  
                   
                  As pointed out in section 3, the main difference between sampling 
                  and filtering is that filtering never depends on the temporal or 
                  spatial position of packets. We introduce two classes of filters. In 
                  the first one, the property can be directly derived by applying a 
                  function on some bits of the packet, while in the second one the 
                  property depends on router state or on the router’s reaction to a 
                  particular packet. 
                  The filters of the first class should be able to operate at full 
                  line rate, while some of the ones of the second may need to be 
                  preceded by a sampling function (e.g. because they involve access to 
                  router state). 
                   
                  [Discussion needed on router-state based filtering] 
                   
                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 12] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
               8.1 Filtering operating directly on some of the packet’s bits 
                   
                  These filters functionally operate as follow: 
                   
                     - They select some bits of the packet (not or not only 
                        necessarily those of the header). 
                     - They apply a function on the selected bits. The function can be 
                        as simple as the identity function (i.e. this step is logically 
                        skipped), or as complex as a hash function. 
                     - They feed the result of the function into an indicator 
                        function, that returns a “select/do not select” result. 
                   
                  Examples of filters of this class are filters that select packets on 
                  the basis of the matching of some of the header fields with a 
                  (possibly masked) pre defined value, filters that select the packets 
                  that have some header field value falling within a predefined range, 
                  or filters that select some header fields and/or a portion of the 
                  payload, apply a hash function and then select the packet if the 
                  results is in the hash selection range. Note that in the latter 
                  case, the selected bits may not be the only one forming the input of 
                  the hash function. For example, a “secret” bit sequence could be 
                  appended to the selected bits in order to make it harder for an 
                  attacker to forge packets being either always or never selected. 
                
                  An implementation isn’t constrained to apply exactly all these steps 
                  or in this sequence, provided that the result is equivalent to a 
                  logical function doing it. 
                   
               8.2 Filtering considering router reaction or router state 
                   
                  This class of filters select a packet on the basis of the following 
                  conditions), possibly combined with the AND, OR or NOT operators. 
                
                     - Ingress interface at which packet arrives equals a specified 
                        value 
                     - Egress interface to which packet is routed to equals a 
                        specified value 
                     - Packet violated acl on the router 
                     - Failed rpf  
                     - Failed rsvp 
                     - No route found for the packet 
                     - Origin AS equals a specified value or lies within a given  
                        range 
                     - Destination AS equals a specified value or lies within a given 
                        range 
                      
               9. Information Model for Filtering Techniques 
                   
                  In this section we define the information models for most common 
                  filtering techniques. The information model structure closely 
                  parallels the one presented for the sampling techniques. 
                   
                  Filter Description: 
                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 13] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                       SELECTOR_ID 
                       SELECTOR_TYPE 
                       SELECTOR_PARAMETERS 
                       OPERATING_TIME 
                       ASSOCIATIONS 
                
                  Where: 
                   
                  SELECTOR_ID: 
                  Unique ID for the packet filter. The ID can be calculated under 
                  consideration of the ASSOCIATIONS and a local ID. 
                   
                  SELECTOR_TYPE 
                  Description: For filtering processes the SELECTOR TYPE defines what 
                  filtering type is used. 
                  Values: Matching | Hashing | Router_state 
                   
                  SELECTOR_PARAMETERS 
                  Description: For filtering processes the SELECTOR PARAMETERS define 
                  formally the common property of the packet being filtered. For the 
                  filters of type Matching and Hashing the definitions have a lot of 
                  points in common. 
                  Values: 
                  Case Matching 
                     - <Header type = ip v4 > 
                     - <bit specification, header part> 
                     - <Selection interval specification, header part> 
                     - <Header type = ipv6> 
                     - <bit specification, header part> 
                     - <Selection interval specification, header part> 
                     - <payload byte number N> 
                     - <bit specification, payload part> 
                     - <Selection interval specification, payload part> 
                   
                  Notes to Case Matching: 
                   
                     - The filter can be defined for the header part only, for the 
                        payload part only or for both. In the latter case the matching 
                        must be an AND of the two. 
                     - The bit specification, for the header part, can be specified 
                        for ipv4 or ipv6 only, or both 
                     - In case of ipv4, the bit specification is a sequence of 20 
                        Hexadecimal numbers [00,FF] specifying a 20 bytes bitmask to be 
                        applied to the header 
                     - In case of ipv6, it is a sequence of 40 Hexadecimal numbers 
                        [00,FF] specifying a 40 bytes bitmask to be applied to the 
                        header 
                     - The bit specification, for the payload part, is a sequence of 
                        Hexadecimal numbers [00,FF] specifying the bitmask to be 
                        applied to the first N bytes of the payload, as specified by 
                        the previous field. In case the Hexadecimal number sequence is 
                        longer then N, only the first N numbers are considered. 

                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 14] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                     - In case the payload is shorter than N, the packet will not 
                        match the filter Other options, like padding with zeros, may be 
                        considered in the future. 
                     - The selection interval specification is a list of non 
                        overlapping intervals [intv_begin, intv_end] where intv_begin, 
                        intv_end are bit strings of length 20*8 (ipv4 case), 40*8 (ipv6 
                        case), N*8 (payload case). 
                     - A filter cannot be defined on the options field of the ipv4 
                        header, neither on stacked headers of ipv6. 
                     - This specification doesn’t preclude the future definition of a 
                        high level syntax for defining in a concise way bit selection 
                        and matching rules in a more human readable form (e.g. “TCP 
                        port in [2000,3000]”). The requirement is that such a syntax 
                        can be univoquely compiled into the one defined above 
                   
                  Case Hashing: 
                     - <Header type = ipv4> 
                     - <Input bit specification, header part> 
                     - <Header type =  ipv6> 
                     - <Input bit specification, header part> 
                     - <payload byte number N> 
                     - <Input bit specification, payload part> 
                     - <additional hash input bits (seed)> 
                     - Hashing function specification (includes length of hash 
                        function output M) 
                     - Selection interval specification, as a list of non overlapping 
                        intervals [start value, end value] where value is in [0,2^M-1] 
                   
                  Notes to Case Hashing: 
                   
                     - On Input bit specifications fields, the same notes on bit 
                        specifications of the Matching case reported above apply 
                   
                  Case Router State: 
                
                     - Ingress interface at which packet arrives equals a specified 
                        value 
                     - Egress interface to which packet is routed to equals a 
                        specified value 
                     - Packet violated acl on the router 
                     - Failed rpf  
                     - Failed rsvp 
                     - No route found for the packet 
                     - Origin AS equals a specified value or lies within a given  
                        range 
                     - Destination AS equals a specified value or lies within a given 
                        range 
                
                  Note to Case Router State: 
                     - All Router state entries can be linked by AND, OR, NOT 
                        operators 
                      
                  OPERATING_TIME 
                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 15] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                  Description: The OPERATING_TIME parameter describes the start/stop 
                  time of filtering process. List elements must not overlap. The start 
                  time of the first element can be omitted, meaning “from now”. The 
                  end time of the last element can be omitted, meaning “until sampler 
                  is removed”. 
                  Values: List of (Start time, End time) 
                   
                  ASSOCIATIONS 
                  Description: The ASSOCIATIONS field describes the observation point 
                  and the IPFIX processes to which the packet selector is associated. 
                  The STREAM ID denotes the origin of the data stream that is input to 
                  the selection function. It can be the observation point directly or 
                  the ID of another selector. With this it is possible to define 
                  combined schemes. If the STREAM ID contains IDs from other 
                  selectors, one can derive the original observation point from the 
                  selector definitions of these specified selectors. 
                   
                  Values: STREAM ID, Metering process ID, Exporting process ID> 
                  With STREAM ID: Observation point ID | List of SELECTOR_IDs 
                   
               10. Composite Techniques  
                   
                  Composite schemes are realized by using the STREAM ID in the 
                  information models. The STREAM ID denotes from which selectors the 
                  input stream originates. If multiple stream IDs are given, this 
                  means that the selector operates on the packet stream simply 
                  resulting from the time superposition of the output of all the 
                  listed filters and samplers. Note that a sampler/filter could be 
                  intermittently active, as defined in the OPERATING TIME field. 
                  Some examples of composite schemes are reported below. 
                   
               10.1    Cascaded filtering->sampling or sampling->filtering 
                
                  If a filter precedes a sampling process the role of filtering is to 
                  create a set of “parent populations” from a single stream that can 
                  then be fed independently to different sampling functions, with 
                  different parameters tuned for the population itself (e.g. if 
                  streams of different intensity result from filtering, it may be good 
                  to have different sampling rates). If filtering follows a sampling 
                  process, the same sampling rate and type is applied to the whole 
                  stream, independently of the relative size of the streams resulting 
                  from the filtering function. Moreover, also packets not destined to 
                  be selected will “load” the sampling function. So, in principle, 
                  filtering before sampling allows a more accurate tuning of the 
                  sampling procedure, but if filters are too complex to work at full 
                  line rate (e.g. because they have to access router state 
                  information), sampling before filtering may be a need. 
                   
               10.2    Stratified Sampling  
                   
                  Stratified sampling is one example for using a composite technique. 
                  The basic idea behind stratified sampling is to increase the 
                  estimation accuracy by using a-priori information. The a-priori 
                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 16] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                  information is used to perform an intelligent grouping of the 
                  elements of the parent population. With this a higher estimation 
                  accuracy can be achieved with the same sample size. 
                   
                  Stratified sampling divides the sampling process into multiple 
                  steps. First, the elements of the parent population are grouped into 
                  subsets in accordance to a given characteristic. This grouping can 
                  be done in multiple steps. Then samples are taken from each subset.  
                   
                  The stronger the correlation between the characteristic used to 
                  divide the parent population and the characteristic of interest (for 
                  which an estimate is sought after), the easier is the consecutive 
                  sampling process and the higher is the stratification gain. For 
                  instance if the dividing characteristic were equal to the 
                  investigated characteristic, each element of the sub-group would be 
                  a perfect representative of that characteristic. In this case it 
                  would be sufficient to take one arbitrary element out of each 
                  subgroup to get the actual distribution of the characteristic in the 
                  parent population. Therefore stratified sampling can reduce the 
                  costs for the sampling process (i.e. the number of samples needed to 
                  achieve a given level of confidence). 
                
                  For stratified sampling one has to specify classification rules for 
                  grouping the elements into subgroups and the sampling scheme that is 
                  used within the subgroups. The classification rules can be expressed 
                  by multiple filters. For the sampling scheme within the subgroups 
                  the parameters have to be specified as described above. 
                   
               11. Security Considerations 
                
                  Security threats can occur if the configuration of sampling 
                  parameters or the communication of sampling parameters to the 
                  application is corrupted. This document only describes sampling 
                  schemes that can be used for packet selection. It neither describes 
                  a mechanism how those parameters are configured nor how these 
                  parameters are communicated to the application. Therefore the 
                  security threats that originate from this kind of communication 
                  cannot be assessed with the information given in this document.  
                   
                  In some cases malicious users or attackers may be interested to hide 
                  packets from the service provider. For instance if packet selectors 
                  are used for accounting or intrusion detection applications, users 
                  may want to prevent that packets are selected. If a deterministic 
                  sampling scheme is used or a selection scheme that takes packet 
                  content into account, the user can shape or send packets in a way 
                  that they are less likely to be selected. This has to be taken into 
                  account when choosing an appropriate packet selection technique. 
                
               12. Acknowledgements 
                   
                  We would like to thank Nick Duffield for providing some text on 
                  hash-based sampling.  
                
                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 17] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
               13. References 
                   
                  [AmCa89]    Paul D. Amer, Lillian N. Cassel: Management of Sampled 
                               Real-Time Network Measurements, 14th Conference on Local 
                               Computer Networks, October 1989, Minneapolis, pages 62-
                               68, IEEE, 1989 
                   
                  [ClPB93]    K.C. Claffy, George C Polyzos, Hans-Werner Braun: 
                               Application of Sampling Methodologies to Network Traffic 
                               Characterization, Proceedings of ACM SIGCOMM'93, San 
                               Francisco, CA, USA, September 13 - 17, 1993 
                   
                  [CoGi98]    I. Cozzani, S. Giordano: Traffic Sampling Methods for 
                               end-to-end QoS Evaluation in Large Heterogeneous 
                               Networks. Computer Networks and ISDN Systems, 30 (16-
                               18), September 1998. 
                   
                  [DuGG02]    Nick Duffield, Albert Greenberg, Matthias Grossglauser, 
                               Jennifer Rexford: A Framework for Passive Packet 
                               Measurement, Internet Draft draft-duffield-framework-
                               papame-01, work in progress, February 2002                      
                   
                  [DuGr00]    Nick Duffield, Matthias Grossglauser: Trajectory 
                               Sampling for Direct Traffic Observation, Proceedings of 
                               ACM SIGCOMM 2000, Stockholm, Sweden, August 28 - 
                               September 1, 2000. 
                   
                  [JePP92]    Jonathan Jedwab, Peter Phaal, Bob Pinna: Traffic 
                               Estimation for the Largest Sources on a Network, Using 
                               Packet Sampling with Limited Storage, HP technical 
                               report, Managemenr, Mathematics and Security Department, 
                               HP Laboratories, Bristol, March 1992, 
                               http://www.hpl.hp.com/techreports/92/HPL-92-35.html 
                  [Knuth98]   Donald E. Knuth: The Art of Computer Programming, Volume 
                               3: Searching and Sorting, Addison Wesley, 1998  
                                 
                  [QuZC02]    J. Quittek, T. Zseby, B. Claise, S. Zander, G. Carle, 
                               K.C. Norseth: Requirements for IP Flow Information 
                               Export, Internet Draft <draft-ietf-ipfix-reqs-05.txt>, 
                               work in progress, August 2002 
                   
                  [Zseb02]    Tanja Zseby: Deployment of Sampling Methods for SLA 
                               Validation with Non-Intrusive Measurements, Proceedings 
                               of Passive and Active Measurement Workshop (PAM 2002), 
                               Fort Collins, CO, USA, March 25-26, 2002  
                   
               14. Author's Addresses 
                   
                  Tanja Zseby 
                  Fraunhofer Institute for Open Communication Systems 
                  Kaiserin-Augusta-Allee 31 
                  10589 Berlin 
                  Germany 
                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 18] 

               Internet Draft  Techniques for IP Packet Selection    March 2003 
                
                
                  Phone: +49-30-34 63 7153 
                  Fax:   +49-30-34 53 8153 
                  Email: zseby@fokus.fhg.de 
                   
                  Maurizio Molina 
                  NEC Europe Ltd., Network Laboratories 
                  Adenauerplatz 6 
                  69115 Heidelberg 
                  Germany 
                  Phone: +49 6221 90511-18  
                  Email: molina@ccrle.nec.de 
                
                  Fredric Raspall 
                  NEC Europe Ltd., Network Laboratories 
                  Adenauerplatz 6 
                  69115 Heidelberg 
                  Germany 
                  Phone: +49 6221 90511-31  
                  EMail: raspall@ccrle.nec.de 
                
                   
               15. Full Copyright Statement 
                
                  Copyright (C) The Internet Society (2002). All Rights Reserved. This 
                  document and translations of it may be copied and furnished to 
                  others, and derivative works that comment on or otherwise explain it 
                  or assist in its implementation may be prepared, copied, published 
                  and distributed, in whole or in part, without restriction of any 
                  kind, provided that the above copyright notice and this paragraph 
                  are included on all such copies and derivative works. However, this 
                  document itself may not be modified in any way, such as by removing 
                  the copyright notice or references to the Internet Society or other 
                  Internet organizations, except as needed for the purpose of 
                  developing Internet standards in which case the procedures for 
                  copyrights defined in the Internet Standards process must be 
                  followed, or as required to translate it into languages other than 
                  English.  
                   
                  The limited permissions granted above are perpetual and will not be 
                  revoked by the Internet Society or its successors or assigns. 
                   
                  This document and the information contained herein is provided on an 
                  "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 
                  TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 
                  BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 
                  HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 
                  MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 
                   





                 
               Zseby, Molina, Raspall       Expires September 2003          [Page 19]