[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
CUD and FBD
Hi Iljitsch,
I guess that the issue of whether CUD or FBD is more appropriate for
the shim protocol is still on the table, so i will try to recap the
aspects that i can think of for making such comparison. Here goes an
initial list, inspired on what's on the draft and following discussions
in the list:
- Who detects what?
By this, i mean the capability of each node to to detect exactly in
which direction of the communication path the failure has occurred.
FBD: Since each node is committed to generate traffic with a given
frequency, if a node stops receiving traffic, this means a failure in
the incoming direction of the communication path. So, in FBD only
failures in the incoming paths are detected. If the failure also
affects the outgoing direction, this is detected by the other end of
the communication. The result is that failures are detected by the
receiver and it knows for sure that a failure affecting at least the
incoming path has occurred. The reaction to this failure needs to
involve for certain the sender node, so, the node that detects the
outage needs to inform the other end of the failure so the recovery
procedure can be started.
CUD: The nodes check path availability by performing an explicit
reachability test. In this case, the detected failure may affect any of
the directions of the communication. i.e. this mechanism does not
provides the precise direction that is affected by the failure. This
means that the node that has detected the failure cannot tell whether
he should try alternative locator pair itself and/or should try to
inform the other end about the failure so that it is the other end who
uses alternative locators.
So, in FBD each node has a clear view about a failure in their incoming
path, while in CUD, each node can detect a failure but it is not
certain of which direction is affected.
This difference may not be relevant, depending on which are the
following steps.
In FBD, the node that has detected the outage needs to inform the other
end about the failure, in a reliable manner. This may result in
starting the full path exploration procedure as described in the draft,
so that the other end starts the procedure itself.
In CUD, the behaviour is likely to be the same, since the node that has
detected the outage does not knows which direction has failed, so both
nodes need to perform the full path exploration procedure.
So, i think that the result in both mechanisms is similar.
- Reaction time
Another issue that may be relevant when comparing CUD and FBD is the
time required by each mechanism for detecting the outage and reacting
accordingly.
In FBD, each node expects to receive traffic with a minimum frequency
e.g. at least 1 packet each t seconds. If no packets are received in t
seconds, this may mean that a failure has occurred or that the packet
was lost for other reason. So, probably the node will wait for n*t
seconds without receiving incoming packets until it determines that a
failure has occurred.
In CUD, each node waits for a given time Tu for ULP feedback. If no
feedback is received during Tu seconds, then it probes reachability by
sending probes. This means that it will send a probe and wait for To
for the answer to come back. It will probably send several probes, e.g.
m probes. The node will wait for w seconds between each probe.
So the time required in FBD would be: n*t seconds
And the time required in CUD would be Tu + To + (m-1)w
Now i think that for the comparaison, and if the expected resiliency is
to be similar, we could assume that n = m (the number of probes is the
same).
Also, that Tu will be smaller or similar to t, since Tu is the timeout
of the application, and t is the default timeout of the shim.
Probably, To will be smaller or similar to Tu for the same reasons than
before.
I think that w can be set to a much smaller value that To because the
purpose of sending multiple packets between intervals of w seconds is
to avoid assuming a failure because of a punctual packet loss.
So, according to these considerations, i would say that probably CUD
will be a bit faster, but i guess that the reaction time in both cases
will be similar.
- Overhead
In CUD, each time there is an idle period, 2 packets are generated, one
in each direction.
In FBD, each time there is an idle period, i packet is generated
FBD imposes half of the overhead than CUD
[Note: As currently defined in the draft, if there is no ULP feedback,
CUD will periodically generate probes, which would greatly increment
the overhead imposed by CUD when the ULP does not provides feedback
(e.g. UDP) resulting in an overhead much greater than the double of the
one of FBD . However, i think that not only ULP feedback can be used as
an indicator of communication progress, but also the reception of
packets could be assumed as an indication of progress. In this case,
the overhead of CUD would be the double of the one in FBD]
- ULP interaction
As defined in the draft, CUD requires ULP feedback in order to greatly
reduce overhead. However, as i mentioned above, i think that a
workaround can be used to deal with this issue.
IMHO, both mechanisms can benefit from negative ULP feedback in a
similar way, since CUD wouldn't need to wait for Tu and FBD wouldn't
have to wait for n*t. Probably, the benefit is much greater in FDB.
So, as far as i can see, both mechanisms behave quite similarly.
Probably, the main difference is w.r.t. the overhead, where FBD is
superior.