[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Failure Detection (was Re: soft state (was Re: shim6 and bit errors in data packet headers
Hi Erik,
El 06/06/2005, a las 20:12, Erik Nordmark escribió:
Marcelo,
you seemed to have concluded that we need negative advice from the
ULPs (or a combination of negative and positive advice), but I must
have missed how you came to that conclusion.
well, i haven't reach to any conclusion so far, i was just exploring
how the ULP feedback would look like (sorry if my writing expressed
otherwise)
In my mind, given that some ULPs might not provide any advice and we
want to provide failover on that case, we need the default behavior of
the shim to be to probe at some frequency (once every 10 seconds or
so?) when the ULP is sending packets. Perhaps it makes sense to use Tx
and Rx counters to be able to suppress such probes, and I think it
makes sense to use positive ULP advice (when given) to suppress the
probes.
Do we agree so far?
agree
Once we have the above, then I don't see what negative ULP advice
would add.
well, i guess that it would result in a faster detection of the
problem, since if the failure detection is made through lack of
possitive feedback, then the shim would need to timeout. Of course, the
ULP would also need to timeout, but probably the timeout of the ULP is
more adjusted than the deafult value of the shim. In any case, as you
point out below, this can be achieved also by letting the ulp to inform
the shim about the timeout. So i don't have a strong opinion w.r.t. to
this positive or negative feedback issue so far
regards, marcelo
We clearly can't always send probes on negative advice, since that
could mean that a congestive loss in the network would result in
negative ULP advice, causing the host to send more packets, which
could make the congestion worse. While one could rate limit the probes
triggered by negative advice, the negative advice just seems like
added complexity. (A rate limit of one every 10 seconds means that the
negative advice would be completely ignored, with a default of probing
once every 10 seconds.)
that brings out an interesting issue: what if we have multiple ULPs
using the same session and they provide different feedback?
For instance, a simple case would be that some apps are more
sensitive than others, so they will complain sooner. More complex
cases could be that one app complains and the other one provides
positive feedback (suppose that the failure is on the app level and
not in the path for instance) how do we deal with this?
If we only use positive advice then I think we can avoid this
complexity. In any case, I don't think we should worry much about it
up front. Later it might make sense to allow the ULPs to express a
"desired failure detection time" to the shim, and the shim can use
that to determine how often to probe. If ULPs with different desires
use the same ULID pair, then presumably the shim would have to operate
on the minimum of the requested times. And the shim presumably needs
to ensure that it doesn't try to probe more frequently than a
conservative RTT estimate no matter what the ULPs desire.
IMHO, ULP feedback should result in an explicity reachabililty test
on the current locator pair i.e. ULP feedback does not directly
implies rehoming, but in a verification though a reachability test
exchange of the current locator pair.
Here you are equation "ULP feedback" with "negative ULP feedback".
Such verification can lead to congestion (and as I stated above, I
don't see how handling negative ULP feedback fits into the big
picture.)
Erik