[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Failure Detection (was Re: soft state (was Re: shim6 and bit errors in data packet headers
Marcelo,
you seemed to have concluded that we need negative advice from the ULPs
(or a combination of negative and positive advice), but I must have
missed how you came to that conclusion.
In my mind, given that some ULPs might not provide any advice and we
want to provide failover on that case, we need the default behavior of
the shim to be to probe at some frequency (once every 10 seconds or so?)
when the ULP is sending packets. Perhaps it makes sense to use Tx and Rx
counters to be able to suppress such probes, and I think it makes sense
to use positive ULP advice (when given) to suppress the probes.
Do we agree so far?
Once we have the above, then I don't see what negative ULP advice would
add. We clearly can't always send probes on negative advice, since that
could mean that a congestive loss in the network would result in
negative ULP advice, causing the host to send more packets, which could
make the congestion worse. While one could rate limit the probes
triggered by negative advice, the negative advice just seems like added
complexity. (A rate limit of one every 10 seconds means that the
negative advice would be completely ignored, with a default of probing
once every 10 seconds.)
that brings out an interesting issue: what if we have multiple ULPs
using the same session and they provide different feedback?
For instance, a simple case would be that some apps are more sensitive
than others, so they will complain sooner. More complex cases could be
that one app complains and the other one provides positive feedback
(suppose that the failure is on the app level and not in the path for
instance) how do we deal with this?
If we only use positive advice then I think we can avoid this
complexity. In any case, I don't think we should worry much about it up
front. Later it might make sense to allow the ULPs to express a "desired
failure detection time" to the shim, and the shim can use that to
determine how often to probe. If ULPs with different desires use the
same ULID pair, then presumably the shim would have to operate on the
minimum of the requested times. And the shim presumably needs to ensure
that it doesn't try to probe more frequently than a conservative RTT
estimate no matter what the ULPs desire.
IMHO, ULP feedback should result in an explicity reachabililty test on
the current locator pair i.e. ULP feedback does not directly implies
rehoming, but in a verification though a reachability test exchange of
the current locator pair.
Here you are equation "ULP feedback" with "negative ULP feedback". Such
verification can lead to congestion (and as I stated above, I don't see
how handling negative ULP feedback fits into the big picture.)
Erik