Dear all,
Here is a list of comments on the sampling and filtering techniques version 2 draft.
As always, feel free to start a new thread on a specific topic discussed below, with a new email subject.
4.
Section: terminology
I'm wondering why some terms are not copied over from the draft-ietf-psamp-framework-03.txt.
For example, the observation point which is referenced more than once in the draft.
For example, the oberved packet stream which is quite essential but never referred to in this draft
(see one of my remark below about it)
etc...
So why not copy over the entire section?
11.
Section: Scope and Deployment of Packet Selection Techniques
Note that a common technique to select packets is to compute a hash function on some bits of the packet header and/or content and to select it if the result falls in a certain selection range. Since hashing is a deterministic operation, it is a powerful mean to ensure that the same packets are selected at multiple measurement points. Depending on the chosen input bits, on the hash function and on the selection range, this technique could also be used to emulate the random selection of packets with a given probability p. Hashing is then a particular type of filtering, but can also be used to emulate random sampling.
I would rewrite this with the terminology section in mind: hash-based selection, hash domain, hash range, hash function, hash selection range
Something like
Note that a common technique to select packets is to compute a Fash Function on the Hash Domain (some bits of the packet header and/or content) and to select it if the Hash Range
falls in the Hash Selection Range. Since hashing is a deterministic operation, it is a powerful mean to ensure that the same packets are selected at multiple measurement points. Depending on the chosen input bits of the Hash Domain, on the Hash Function and on the Hash Selection Range, the Hash-based selection could also be used to emulate the random selection of packets with a given probability p. Hashing is then a particular type of filtering, but can also be used to emulate random sampling.
13.
Section: Scope and Deployment of Packet Selection Techniques
We consider packet selectors as part of an IPFIX metering process which also can use the IPFIX exporting process. This is expressed as association to one or more IPFIX processes.
I think this notion above is essential but shouldn't it be part of the framework draft draft-ietf-psamp-framework-03.txt instead of this draft?
18.
Section: 3.1.2.2.3 Non-Uniform flow State dependent sampling
Another type of sampling that can be classified as Non-Uniform _(and, possibly, probabilistic)_ is closely related to the flow concept as defined in [QuZC02], ...
I don't understand "(and, possibly, probabilistic)" because we are already under the probabilistic sampling chapter 3.1.2.2
19. We wrote in the terminology section: selection based on packet content = filtering.
But in section 3.1.2.2.3, we also wrote _ This type of sampling is also content dependent because the identification of the flow the packet belongs to requires analyzing part of the packet content_.
And _ n-out-of-N sampling and uniform probabilistic sampling are contentû
independent selection schemes. For non-uniform probabilistic sampling the sampling probability can be based on packet content. _
I would create a new small section "3.1.3 sampling and packet content", that would explain something like this:
The terminolgy sections defines:
Filtering: a filter is a selection operation that selects a packet deterministically based on the packet content, its treatment, and functions of these occurring in the selection state. Examples include match/mask filtering, and hash-based selection. Sampling: a selection operation that is not a filter is called a sampling operation.
We can deduce that not a single sampling selection can be based on the packet content.
Nevertheless, for the more advanced sampling selections, the distinction between sampling and filtering is becoming subtle.
And some selection operations classified as sampling could in reality be based on packet content.
These shoud anyway be considered as exceptions.
The table below summarizes the behavior of the different sampling operations
| content-independent | content-dependent Sampling Scheme | sampling | sampling --------------------------------+-----------------------+-------------------- systematic sampling: | | count-based | X | --------------------------------+-----------------------+-------------------- systematic sampling: | | time-based | X | --------------------------------+-----------------------+-------------------- random sampling: | | n-out-of-N | X | --------------------------------+-----------------------+-------------------- random, probabilitic sampling: | | uniform probabilistic | X | --------------------------------+-----------------------+-------------------- random, probabilitic sampling: | | non-uniform probabilistic | | X --------------------------------+-----------------------+-------------------- random, probabilitic sampling: | | non-uniform flow-state | | X --------------------------------+-----------------------+---------------------
Note: I'm almost sure that the table will not be formatted in the correct way, so I attached a version in word.
This word document contains 2 tables. The second one is the table of section 5 where the terminology has been slightly modified.
Also in the Section: Scope and Deployment of Packet Selection Techniques
The selection technique used to select a subset of packets out of all those crossing an observation point depends on the purpose (application) for which measurement is performed. If the main purpose of an application is to infer some characteristic of the whole set of crossing packets without processing them all (thus reducing the computation load) then we call the used selection technique ôsamplingö. _In principle, with sampling the content of the packet is not relevant for the packet selection_: what matters is only that the selected sample has a distribution of the characteristic to infer similar to the one of the parent population, so that it can be estimated reliably. The sampling decision may be based on the temporal or spatial position of the packet in the packet stream, or may depend on a (pseudo) random number extraction or calculation.
I would add a reference to the new section.
In principle, with sampling the content of the packet is not relevant for the packet selection (see section 3.1.3 sampling and packet content): ...
21.
Section: 4.2 Hashing filtering A hash function h maps the packet content c, or some portion of it, onto a range R. The packet is selected if h(c) is an element of S, which is a subset of R called the ôselection rangeö. Thus hash-based sampling is indeed a particular case of filtering: the object is selected if c is in inv(h(S)). But for desirable hash functions the inverse image inv(h(S)) will be extremely complex, and hence h would not be expressible as, say, a match/mask filter or a simple combination of these.
Like in my remark 11, it would be better to rewrite it with the terminology in mind: hash-based selection, hash domain, hash range, hash function, hash selection range
23.
Section: 4.2.2 Consistent packet selection and its applications
Isn't it covered already in section 10.2 from the framework draft?
28.
Section: 5.1 Information Model for Sampling Techniques
SELECTOR_PARAMETERS Description: For sampling processes the SELECTOR PARAMETERS define the input parameters for the process. Interval length in systematic sampling means, that all packets that arrive in this interval are selected. The spacing parameter defines the spacing in time or number of packets between the end of one sampling interval and the start of the next succeeding interval.
Case n out of N: - _List of n sampling positions in an array of N positions_ Case Systematic Time Based: - Interval length (in usec), Spacing (in usec) Case Systematic Count Based: - Interval length(in packets), Spacing (in packets) Case uniform Probabilistic(with equal probability per packet): - Sampling probability p Case non-uniform probabilistic: - Calculation function for sampling probability p Case _non-uniform_ flow state: - Policy for selecting flows (e.g. give priority to large flows)
List of n sampling positions in an array of N positions:
What if we use random numbers? Exporting all random number (or the positions) doesn't make sense!
And with the random number of the positions, one could try to reverse engineer the function...
I think we must just export n and N and assume a good random number generation function!
Minor detail, I would keep the selection operation order as defined in the table of content
29.
Section: 5.1 Information Model for Sampling Techniques
OPERATING_TIME Description: The OPERATING_TIME parameter describes the start/stop time of sampling process. List elements must not overlap. The start time of the first element can be omitted, meaning ôfrom nowö. The end time of the last element can be omitted, meaning ôuntil sampler is removedö.
Values: List of (Start time, End time) Why are these values interesting to report?
Unless you want those for configuration, i.e. I want to enable this sampling function for 10 minutes starting tomorrow at noon.
I'm not sure this is interesting!
That's it for now regarding my comments.
Regards, Benoit.