[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: failure detection



Iljitsch van Beijnum wrote:
Hi,

Somewhat to my disappointment, there were no opinions about which of the two failure detection mechanisms is better, either during the sessions or afterward on the list.

There wasn't any discussion after your presentation :-(
What I would have said is that I think FBD is sufficient for verifying reachability and simple. All we need is reference the text in the l3shim draft which says that the shim isn't applied to multicast.


Below you seem to assume that there is an option to do probing instead of (or in addition to) FBD, which complicates things and introduces the desire to supress such probing.

I don't think we need that, but I haven't fully digested the issues of which end discovers the lack of reachability yet, so I reserve the right to change my mind :-)

   Erik

Since then, I've been thinking about ways to unify the two approaches. What I've been able to come up with is this:

1. when host A is uncertain about the reachability status of the current address pair, it sends host B a request
2. host B now sends back an answer, so host A knows that in the absence of this answer that something is wrong
3. host A may optionally request that host B send X packets at Y intervals to allow A to determine that there is reachability in the B -> A direction
4. with this request, A may also request that B suppress the additional packets when there is other traffic in the B -> A direction


Capabilities 1 and 2 are mandatory, capabilities 3 and 4 are optional.

A brain-dead implementation would simply send a probe and expect an answer every 10 seconds or so. (I'm not sure we should allow this as the entire failure detection mechanism as it adds unnecessary extra packets for all associations.)

An implementation that uses upper layer advice would only initiate a probe/reply sequence when the upper layer is unhappy.

An implementation that knows about bidirectional data flow would only initiate a probe/reply sequence when there is outgoing traffic but no incoming traffic. It would probably also ask the other side to keep sending packets when it has no payload data to transmit. The other side may or may not implement this. If it does, so much the better. If it doesn't, there will be more frequent unidirectional traffic so host A will do a probe/reply more often.

Although this increases the number of possible ways the failure detection can play out and thus makes debugging more difficult, it has the advantage that a basic impledmentation is simple, but it allows for implementers to add additional capabilities and improve failure detection algorithms without having to go through an IETF standardization cycle.

Thoughts?