[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: soft state (was Re: shim6 and bit errors in data packet headers



Iljitsch van Beijnum wrote:
On 27-mei-2005, at 23:37, Erik Nordmark wrote:

I would think that if one side triggers reachability testing, the other side would also do it. The probes in one direction can also function as replies in the other direction, cutting down on the number of packets exchanged.


I'm not sure we'd end up with both ends triggering reachability testing at about the same time, because that would seem to assume some form of periodic reachability testing even when there is no ULP traffic. Such background chatter seems undesirable.


I'm not sure what you mean here...

I think it was the use of "trigger" in your earlier email that triggered me to go off in what is perhaps tangential to your point.


What I'm getting at is that one end says "hey, can you still hear me?" and then the other end says "sure, and can you still hear me?". The first party then replies with "yes" and we know that there is reachability in both directions.

Yes, one can verify that bidirectional reachability exists (when it does exist) by using 3 packets instead of the naive approach which would result in 4 packets.
But that doesn't necessarily help when reachability does not exist for some subset of the address pairs.


Thus I think we'd want a packet driven trigger of reachability testing (much like NUD). When A sends a ULP packet to B, it checks whether it has current reachability information for B, and if not it triggers reachability testing.


Disagree. I think we should assume reachability until we get a hint that there is none. So if A sends a packet to B, B does nothing and will 99% likely in due course send a packet back because transports tend to work in both directions. However, if A doesn't get a reply and the packet it sent earlier isn't one that is known to go replyless (i.e., TCP ack-only or fin packets, A triggers reachability testing.

This is the debate about positive vs. negative advise from the ULPs. You are advocating that the ULPs provide negative advise. But that isn't sufficient to trigger in all cases of failures.


Take the case when the TCP on A is sending data to B, hence B is only sending ACK packets back to A.
The TCP on A can easily generate negative advise when it has retransmitted a few times and doesn't receive a response.
But things are problematic on B, because there isn't an (efficient) strategy for the TCP on B to generate negative advise - it doesn't run a retransmit timer.


Thus when something fails it will always be up to A to initiate the exploration of alternate locator pairs. Also, the time at which the exploration of alternates start is a function of the retransmit behavior of the ULP, which makes it harder to tightly control the failover time.

It is easier to have the ULPs generate positive advise ("the traffic to this destination is making forward progress") at both ends. The fact that an ACK for new data has been recently received, or that a data packet which advances the sequence number has been recently received, are both easy indications of forward progress.
With such a strategy the shim implementation can do a check after sending a ULP packet: "how long time ago since some positive advise?"
If this exceeds some limit, then the shim can trigger a test of the current locator pair, and if that fails, start testing alternative locator pairs.


The positive advise approach has the benefit that it works even if the ULPs don't generate any advise; in this case the shim would, when ULP packets are being sent, revert to periodically sending a test packet.


As per my suggestion, there would normally not be any reachability testing as long as packets flow in both directions. When there is a bidirectional failure AND both ends were sending data at the same time (= not when data is flowing in one direction and just acks in the other), there would probably be reachability testing triggered in both directions at the same time.

Do you consider this problematic?

Yes, because the ULP on both ends will not be able to detect that there is a problem. The canonical example where this isn't easy is a TCP which is receiving data packets and only sending ACKs; in that case there is no retransmit timer running on which you can hang a "send negative advise to the IP layer" event.


   Erik