[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: comments on draft-ietf-psamp-sample-tech-04.txt

Maurizio Molina wrote:

Benoit Claise wrote:

Hi Maurizio,

My point is that, if systematic time based sampling is implemented, will you do it like 1. or 2.
1.    Case Systematic Time Based:        - Interval length (in usec), Spacing (in usec)
2.    Case Systematic Time Based:        - Interval length (in usec), Spacing (# packets)

The option1 has got the big drawback that we have no idea how many packets will be inspected and as a consequence we don't know what are the bandwidth requirement for the export link(s). And if we do sampling, it's typically because we have a bottleneck on the export link(s) bandwidth or on the collector side...

I'm not 100% sure I understand what type of sampling you're speaking for or against.
However, in your upper sentence you say with Systematic Count based sampling (which includes 1 out of N), you don't have firm limits on the exported bandwidth.
That's true, but this type of sampling (*) allows you to estimate the rate of  the link. With Systematic Time based you cannot see any rate variation on the link because you always export a packet each T sec.
I don't think this is correct, unless I completely misunderstood everything.
Let me reexplain what the issue is, maybe I took some shortcut.
The draft says:
     For sampling processes the SELECTOR PARAMETERS define the input 
     parameters for the process. Interval length in systematic 
     sampling means, that all packets that arrive in this interval 
     are selected. The spacing parameter defines the spacing in time 
     or number of packets between the end of one sampling interval 
     and the start of the next succeeding interval. 
     Case n out of N: 
        - Population size N, Sample size n 

Example: we select randomly n packets out of N. 
No problem on this one
     Case Systematic Count Based: 
        - Interval length(in packets), Spacing (in packets) 

Note: I start with "Case Systematic Count Based" to illustrate my point.
Example: if Interval length = 10 packets, Spacing = 100 packets
  This means: I select 10 packets, I don't select the next 90 packets, I select 10 packets, etc...
  Note2: this is not clear from the draft if this the previous line example or...
         I select 10 packets, I don't select the next 100 packets, I select 10 packets, etc...
         This must be clarified with an example. 
     Case Systematic Time Based: 
        - Interval length (in usec), Spacing (in usec) 
Example: if Interval length = 10 usec, Spacing = 100 usec  
  This means: I select X packets during 10 usec, I don't select packets during the next 90 usec, etc...
  BTW, see my note2 above that is equivalent here: is it 10, 90, 10, 90, ... or 10, 100, 10, 100, ...
And this is my entire point, you select X packets during an interval. And you don't know how many.
You might know it with the ratio 10/100 * bandwidth. BUT you have no clue about the flow records number and 
as a consequence we don't know what is the bandwidth requirement for the export link(s). And if we do 
sampling, it's typically because we have a bottleneck on the export 
link(s) bandwidth or on the collector side... So this way of doing of sampling is dangerous. 
The only application I see for such a sampling scheme is when the bottleneck is the interface or the line card resources, typically the memory.
If you keep this mechanism (anyway this is a MAY requirement), you must say a remark about it.

Now, you speak above about "With Systematic Time based you cannot see any rate 
variation on the link because you always export a packet each T sec."
If you want to do that, and I agree it makes sense (actually a lot more sense that the previous scheme), then you will have a sampling scheme like this:
 Case Systematic Time Based:        - Interval length (# packets), Spacing (usec)
Example: if Interval length = 10 packets, Spacing = 100 usec
  This means: I select 10 packets, I don't select packets during 90 usec, I select 10 packets, etc...
  BTW, the Note2 still applies here.

Regards, Benoit.

(actually, you can only understand if the rate drops below 1/T) .
So, Systematic Time based can be useful for a lot of applications (e.g. random packet content inspection), but not for understanding the dynamics on a link.

(*) and probabilistic sampling as well