[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Failure Detection (was Re: soft state (was Re: shim6 and bit errors in data packet headers



On 2-jun-2005, at 16:39, marcelo bagnulo braun wrote:

However, my understanding is that a mechanism in the lines of what Iljitsch suggests (based on the observation of the existence of a bidirectional flow of packets) could be an interesting optimization in the bidirectional case.

My understanding of how such mechanism would work is the following:

The shim layer observes the amount of traffic exchanged during the last T seconds, being Tx the number of packets transmitted and Rx the number of packets received during the last T seconds.
If Tx>0 and Rx>0, then no problem
If Tx=0, then no problem
If Tx>0 and Rx=0, then perform a reachability test to verify the current locator pair, and eventually a path exploration exchange

Upon the reception of a path exploration by the peer, the node must perform a reachability test to verify the current locator pair.

I think if we combine all of this, a good mechanism would be:

- if recent rx and recent tx: be happy, everything works
- if no recent rx and no recent tx: session is idle, do nothing
- if recent rx, but not recent tx: send keepalive
- if recent tx, but not recent rx: failure likely, start full reachability testing procedure


(These tests would have to be performed X seconds after sending/ receiving a packet, not at the time of packet transmission/reception.)

The improvement over what you propose and/or NUD is that when there is no outgoing traffic, we just send keepalives, which could theoretically be just IPv6 headers without a payload. Since these are unidirectional they don't take up bandwidth in the other direction, as there is traffic flowing in that direction anyway.

Doing things this way guarantuees that traffic can never flow in just one direction, so when a host is sending and not receiving, the only explanation can be a failure, so this is the point where we start a full reachability exploration procedure, with the advantage that we know the current path is almost certainly in trouble.

There wouldn't be any advantage in combining this with positive feedback from ULPs, except maybe some small implementation optimization. However, listening for negative feedback could be useful to start reachability tests faster than the regular shim timeout.

So, imho there are at least two questions to answer in order to see if this is interesting:
1- is this really an optimization? i.e. does it provides some form of improvement (and is it worthy?)
2- How frequent are we expecting this case to be, so that it is worthy to optimize it?

My opinion about these two are the following:

About 1: i guess this mechanism would improve the efficiency of the solution because it would reduce the number of reachability test exchanged to verify the current locator pair in the case of and UDP bidirectional flow (or any other bidirectional flow where the ULP does not provide positive feedback)

Indeed.

About 2: this mechanism optimizes the case of bidirectional flow that belongs to ULPs that not provide positive feedback (e.g. UDP)
However, this mechanism only work in the case where at least one path with bidirectional connectivity is available.

??? Isn't that true of any interactive communication?

This mechanism fails in the case where the only paths available are two different unidirectional paths.

No, not necessarily. The question is what we test for:

locator <-> locator
ULID <-> ULID
host <-> host

I would prefer the last one, but we'll probably have to settle for the middle one because the constraints for the last one would probably be too severe. (I.e., if any packet from A to B means there is reachability from A to B, ALL packets from A to B would have to use the same locators or we could miss some failures. The constraint that all packets from ULID Ax to ULID Bx use the same locator set is probably more reasonable.)