[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

failure detection



Hi,

Somewhat to my disappointment, there were no opinions about which of the two failure detection mechanisms is better, either during the sessions or afterward on the list.

Since then, I've been thinking about ways to unify the two approaches. What I've been able to come up with is this:

1. when host A is uncertain about the reachability status of the current address pair, it sends host B a request
2. host B now sends back an answer, so host A knows that in the absence of this answer that something is wrong
3. host A may optionally request that host B send X packets at Y intervals to allow A to determine that there is reachability in the B -> A direction
4. with this request, A may also request that B suppress the additional packets when there is other traffic in the B -> A direction


Capabilities 1 and 2 are mandatory, capabilities 3 and 4 are optional.

A brain-dead implementation would simply send a probe and expect an answer every 10 seconds or so. (I'm not sure we should allow this as the entire failure detection mechanism as it adds unnecessary extra packets for all associations.)

An implementation that uses upper layer advice would only initiate a probe/reply sequence when the upper layer is unhappy.

An implementation that knows about bidirectional data flow would only initiate a probe/reply sequence when there is outgoing traffic but no incoming traffic. It would probably also ask the other side to keep sending packets when it has no payload data to transmit. The other side may or may not implement this. If it does, so much the better. If it doesn't, there will be more frequent unidirectional traffic so host A will do a probe/reply more often.

Although this increases the number of possible ways the failure detection can play out and thus makes debugging more difficult, it has the advantage that a basic impledmentation is simple, but it allows for implementers to add additional capabilities and improve failure detection algorithms without having to go through an IETF standardization cycle.

Thoughts?