[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CUD and FBD



Hi Iljitsch,

I guess that the issue of whether CUD or FBD is more appropriate for the shim protocol is still on the table, so i will try to recap the aspects that i can think of for making such comparison. Here goes an initial list, inspired on what's on the draft and following discussions in the list:

- Who detects what?
By this, i mean the capability of each node to to detect exactly in which direction of the communication path the failure has occurred.


FBD: Since each node is committed to generate traffic with a given frequency, if a node stops receiving traffic, this means a failure in the incoming direction of the communication path. So, in FBD only failures in the incoming paths are detected. If the failure also affects the outgoing direction, this is detected by the other end of the communication. The result is that failures are detected by the receiver and it knows for sure that a failure affecting at least the incoming path has occurred. The reaction to this failure needs to involve for certain the sender node, so, the node that detects the outage needs to inform the other end of the failure so the recovery procedure can be started.

CUD: The nodes check path availability by performing an explicit reachability test. In this case, the detected failure may affect any of the directions of the communication. i.e. this mechanism does not provides the precise direction that is affected by the failure. This means that the node that has detected the failure cannot tell whether he should try alternative locator pair itself and/or should try to inform the other end about the failure so that it is the other end who uses alternative locators.

So, in FBD each node has a clear view about a failure in their incoming path, while in CUD, each node can detect a failure but it is not certain of which direction is affected.

This difference may not be relevant, depending on which are the following steps.
In FBD, the node that has detected the outage needs to inform the other end about the failure, in a reliable manner. This may result in starting the full path exploration procedure as described in the draft, so that the other end starts the procedure itself.
In CUD, the behaviour is likely to be the same, since the node that has detected the outage does not knows which direction has failed, so both nodes need to perform the full path exploration procedure.
So, i think that the result in both mechanisms is similar.


- Reaction time

Another issue that may be relevant when comparing CUD and FBD is the time required by each mechanism for detecting the outage and reacting accordingly.

In FBD, each node expects to receive traffic with a minimum frequency e.g. at least 1 packet each t seconds. If no packets are received in t seconds, this may mean that a failure has occurred or that the packet was lost for other reason. So, probably the node will wait for n*t seconds without receiving incoming packets until it determines that a failure has occurred.

In CUD, each node waits for a given time Tu for ULP feedback. If no feedback is received during Tu seconds, then it probes reachability by sending probes. This means that it will send a probe and wait for To for the answer to come back. It will probably send several probes, e.g. m probes. The node will wait for w seconds between each probe.

So the time required in FBD would be: n*t seconds
And the time required in CUD would be Tu + To + (m-1)w
Now i think that for the comparaison, and if the expected resiliency is to be similar, we could assume that n = m (the number of probes is the same).
Also, that Tu will be smaller or similar to t, since Tu is the timeout of the application, and t is the default timeout of the shim.
Probably, To will be smaller or similar to Tu for the same reasons than before.
I think that w can be set to a much smaller value that To because the purpose of sending multiple packets between intervals of w seconds is to avoid assuming a failure because of a punctual packet loss.
So, according to these considerations, i would say that probably CUD will be a bit faster, but i guess that the reaction time in both cases will be similar.


- Overhead

In CUD, each time there is an idle period, 2 packets are generated, one in each direction.
In FBD, each time there is an idle period, i packet is generated


FBD imposes half of the overhead than CUD

[Note: As currently defined in the draft, if there is no ULP feedback, CUD will periodically generate probes, which would greatly increment the overhead imposed by CUD when the ULP does not provides feedback (e.g. UDP) resulting in an overhead much greater than the double of the one of FBD . However, i think that not only ULP feedback can be used as an indicator of communication progress, but also the reception of packets could be assumed as an indication of progress. In this case, the overhead of CUD would be the double of the one in FBD]

- ULP interaction

As defined in the draft, CUD requires ULP feedback in order to greatly reduce overhead. However, as i mentioned above, i think that a workaround can be used to deal with this issue.

IMHO, both mechanisms can benefit from negative ULP feedback in a similar way, since CUD wouldn't need to wait for Tu and FBD wouldn't have to wait for n*t. Probably, the benefit is much greater in FDB.

So, as far as i can see, both mechanisms behave quite similarly. Probably, the main difference is w.r.t. the overhead, where FBD is superior.