[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: soft state (was Re: shim6 and bit errors in data packet headers



On 1-jun-2005, at 23:01, Erik Nordmark wrote:

What I propose is a mechanism that purely looks whether traffic is flowing in both directions. This doesn't require parsing any headers except source and destination addresses which the shim must look at anyway.

I guess I don't understand what this would add beyond what your proposed negative advice already can do.

Perhaps you can explain how this would work in this example:
A is using locator pair <A1, B1> and B is using <B1, A1>.
TCP is sending data from A to B, with B responding with ACKs.

The path from B->A fails.
The shim layer on A observes this because it stops receiving packets.
(But TCP on A also starts retransmitting.)
The shim layer on B thinks everything is fine because it sees the packets from A, and it sees the ACKs that B is sending back to A.

What would happen here is that A doesn't see any return traffic so it starts sending reachability probes. In this case B knows about the problem very soon because the first probe presumably uses address pair A1, B1 which still works in the A -> B direction.


But how about this: each side tells the other side a timeout value: after not having seen any traffic from A, B starts probing. Now one of three situations can happen:

But doesn't this lead to a choice between having to send probes when there is no ULP traffic, or having a long timer resulting in a long time until a failure would be detected?

Yes. But as long as we switch from having a short timeout when there is traffic to having a long or infinite timout when there is no traffic this shouldn't be problematic.


- regular traffic: the timer is restarted before it expires by regular
traffic, so the timer never expires and there are no probes
- irregular traffic: in order to make sure the timer doesn't expire if
there isn't any traffic for some time, the sender injects keepalives
so there are no probes

The shim on the sender? Or the ULP? In either case, you end up adding packets when things would have otherwise been idle.

The shim.

That's true, but it's hard to avoid unless you want to rely on ULP timeouts or behavior very similar to ULP timeouts.

And the amount of traffic would be fairly low. For instance, when traffic is flowing the timer could be 15 seconds. So when there is no more traffic, after 15 seconds the sender would have to send a keepalive. The keepalive could contain a new timer value, which could be X times the previous value (30, 60, 120 and so on), or immediately go to infinity. When traffic starts to flow again, the timer would have to be reinitialized, or maybe this can happen automatically.

For TCP, your positive advice would probably be cleaner, but for non- TCP ULPs which typically don't have a similarly advanced retransmission mechanism, and especially for ULPs that don't provide advice, this mechanism, or the other that I mentioned earlier would probably be a good catch-all backup.

There could be slight complexities in determining which failure detection mechanism to use, though. (Ugh: doing failure dectection in middlebox shim implementations.)

An interesting difference between this mechanism and the positive advice mechanism is that with positive advice, failures are initially detected by the sender, while with my timer announcement/timeout mechanism, failures are initially detected by the receiver.

- no traffic: the sender sets the timer to a very large value or
  infinity, so there are no probes

If the timer has been set to e.g. 5 minutes, how quickly can the shim detect a failure?

On the receiver: 5 minutes. On the sender: min: 5 mins + some RTTs, max: 5 mins + N times exponential backoff + some RTTs.


I'm assuming that when reachability probes are sent, probes with different address pairs are sent until a working pair is found in both directions, or it is determined that there is no bidirectional connectivity anymore. So B would be informed unless there is no longer any reachability possible.

Yes, but it takes much longer time for B to be informed, because B will not know there is a problem until A manages to get its first probe through to B.
If B has data driven probes (suppressed when there is positive advice from the ULP), then it can find out sooner.

You'd have to wait for a TCP retransmit before the shim at the sender knows that something is wrong. Depening on many things this may be quick or not so quick. TCP isn't tuned for quick regular timeout retransmissions as the "fast retransmit" mechanism covers the cases where only a single packet is lost.


But I think probing that is suppressed when the ULP doesn't send anything is a useful approach, because it avoids adding any probes or keepalives when the ULP is silent.

Agree.

Hm, ok. But we have to be careful about mandating continuous communication between layers. In a properly layered implementation, this type of communication can be quite expensive (context switches and so on).

FWIW the implementation of this that I know the best doesn't add any communication events between the layers. When TCP goes to send a packet to the IPv6 layer it can set a single flag "positive reachability advise" as part of the packet that goes down to the IPv6 transmit routine.

Ok. Still need to get some input on this before casting it into stone, though: it may not be implementable in this way on some systems.