[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Failure Detection (was Re: soft state (was Re: shim6 and bit errors in data packet headers



Marcelo,

you seemed to have concluded that we need negative advice from the ULPs (or a combination of negative and positive advice), but I must have missed how you came to that conclusion.

In my mind, given that some ULPs might not provide any advice and we want to provide failover on that case, we need the default behavior of the shim to be to probe at some frequency (once every 10 seconds or so?) when the ULP is sending packets. Perhaps it makes sense to use Tx and Rx counters to be able to suppress such probes, and I think it makes sense to use positive ULP advice (when given) to suppress the probes.

Do we agree so far?

Once we have the above, then I don't see what negative ULP advice would add. We clearly can't always send probes on negative advice, since that could mean that a congestive loss in the network would result in negative ULP advice, causing the host to send more packets, which could make the congestion worse. While one could rate limit the probes triggered by negative advice, the negative advice just seems like added complexity. (A rate limit of one every 10 seconds means that the negative advice would be completely ignored, with a default of probing once every 10 seconds.)

that brings out an interesting issue: what if we have multiple ULPs using the same session and they provide different feedback?

For instance, a simple case would be that some apps are more sensitive than others, so they will complain sooner. More complex cases could be that one app complains and the other one provides positive feedback (suppose that the failure is on the app level and not in the path for instance) how do we deal with this?

If we only use positive advice then I think we can avoid this complexity. In any case, I don't think we should worry much about it up front. Later it might make sense to allow the ULPs to express a "desired failure detection time" to the shim, and the shim can use that to determine how often to probe. If ULPs with different desires use the same ULID pair, then presumably the shim would have to operate on the minimum of the requested times. And the shim presumably needs to ensure that it doesn't try to probe more frequently than a conservative RTT estimate no matter what the ULPs desire.


IMHO, ULP feedback should result in an explicity reachabililty test on the current locator pair i.e. ULP feedback does not directly implies rehoming, but in a verification though a reachability test exchange of the current locator pair.

Here you are equation "ULP feedback" with "negative ULP feedback". Such verification can lead to congestion (and as I stated above, I don't see how handling negative ULP feedback fits into the big picture.)


   Erik