[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: comments on draft-ietf-psamp-sample-tech-04.txt



Benoit,
please see inline. I think I've understood all your concerns, except one.
Maurizio

Benoit Claise wrote:

Maurizio Molina wrote:



Benoit Claise wrote:

Hi Maurizio,




My point is that, if systematic time based sampling is implemented, will you do it like 1. or 2.
1. Case Systematic Time Based: - Interval length (in usec), Spacing (in usec)
2. Case Systematic Time Based: - Interval length (in usec), Spacing (# packets)


The option1 has got the big drawback that we have no idea how many packets will be inspected and as a consequence we don't know what are the bandwidth requirement for the export link(s). And if we do sampling, it's typically because we have a bottleneck on the export link(s) bandwidth or on the collector side...

Benoit,
I'm not 100% sure I understand what type of sampling you're speaking for or against.
However, in your upper sentence you say with Systematic Count based sampling (which includes 1 out of N), you don't have firm limits on the exported bandwidth.
That's true, but this type of sampling (*) allows you to estimate the rate of the link. With Systematic Time based you cannot see any rate variation on the link because you always export a packet each T sec.


I don't think this is correct, unless I completely misunderstood everything.

You're right. I provided an example in a hurry. If you use Systematic Time based sampling, with parameters t and T, where t is the interval during wich you sample and T is the one during which you don't sample (see below, I clarify this issue..) the bandwidth can be estimated as
[E(X)/(t+T)]*[(t+T)/t], i.e. E(X)/t
where E(X) is the average of the packets that you sample at each cycle t+T.



Let me reexplain what the issue is, maybe I took some shortcut. The draft says:

SELECTOR_PARAMETERS For sampling processes the SELECTOR PARAMETERS define the input parameters for the process. _Interval length in systematic sampling means, that all packets that arrive in this interval are selected._ The spacing parameter defines the spacing in time or number of packets between the end of one sampling interval and the start of the next succeeding interval. Case n out of N: - Population size N, Sample size n
Example: we select randomly n packets out of N. No problem on this one
Case Systematic Count Based: - Interval length(in packets), Spacing (in packets)
Note: I start with "Case Systematic Count Based" to illustrate my point.
Example: if Interval length = 10 packets, Spacing = 100 packets
This means: I select 10 packets, I don't select the next 90 packets, I select 10 packets, etc...
Note2: this is not clear from the draft if this the previous line example or...
I select 10 packets, I don't select the next 100 packets, I select 10 packets, etc...
This must be clarified with an example.

taking your 10/100 example, the intention was to define the second case you mention, that is:
I select 10 packets, I don't select the next 100 packets, I select 10 packets, etc...
And also for the Systematic Time Based we wanted to mean
I select all packets during 10 usec, I don't select packets during the next 100 usec, etc...
I agree that the text must be improved so that we don't leave any doubt, and that an example should be added.
I'll provide a proposal to Tanja, OK?


Case Systematic Time Based: - Interval length (in usec), Spacing (in usec) Example: if Interval length = 10 usec, Spacing = 100 usec This means: I select _X_ packets during 10 usec, I don't select packets during the next 90 usec, etc...
BTW, see my note2 above that is equivalent here: is it 10, 90, 10, 90, ... or 10, 100, 10, 100, ...
And this is my entire point, you select X packets during an interval. And you don't know how many.
You might know it with the ratio 10/100 * bandwidth. BUT you have _no clue_ about the flow records number and as a consequence we don't know what is the bandwidth requirement for the export link(s). And if we do sampling, it's typically because we have a bottleneck on the export link(s) bandwidth or on the collector side... So this way of doing of sampling is dangerous. The only application I see for such a sampling scheme is when the bottleneck is the interface or the line card resources, typically the memory.

As you say, the maximum (average) export bandwidth will be bounded by (t/t+T)*link_bandwidth[pkt/s]*export_size[bytes/exported packet]. So you can bound it.
The fact that you sample during t all the packets doesn't mean that you must export them within t seconds. In general, you'll have t+T seconds to do so (i.e. you can shape the export traffic). Or, even without shaping, you can set t and T small enough so that at each t at most 1 packet (is sampled).
Was that your concern? the burstiness? In that case, I agree that in the draft we must describe it, and put some reasoning of how to avoid it and give an example, etc...but I don't think that this hinders the usefulness of this (very simple) type of sampling
Regards,
Maurizio



If you keep this mechanism (anyway this is a MAY requirement), you must say a remark about it.


Now, you speak above about "With Systematic Time based you cannot see any rate variation on the link because you always export a packet each T sec."
If you want to do that, and I agree it makes sense (actually a lot more sense that the previous scheme), then you will have a sampling scheme like this:
Case Systematic Time Based: - Interval length (# packets), Spacing (usec)
Example: if Interval length = 10 packets, Spacing = 100 usec
This means: I select 10 packets, I don't select packets during 90 usec, I select 10 packets, etc...
BTW, the Note2 still applies here.


Regards, Benoit.


(actually, you can only understand if the rate drops below 1/T) .
So, Systematic Time based can be useful for a lot of applications (e.g. random packet content inspection), but not for understanding the dynamics on a link.
Maurizio.


(*) and probabilistic sampling as well






-- to unsubscribe send a message to psamp-request@ops.ietf.org with the word 'unsubscribe' in a single line as the message text body. archive: <http://ops.ietf.org/lists/psamp/>