[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: address pair exploration, flooding and state loss



marcelo bagnulo braun wrote:

In #1 it is key that we can get A to realize that the context has been lost on B, so that we can get to the point where the shim on B can pass up the packets to the ULPs. This will cause the ULPs to generate errors (e.g. TCP RST) and we are back to what we have today without a shim.


besides at this point, there is no possible recovery, since ULP state also has been lost

Yes, but "do no harm" presumably includes "don't slow down current failure detection".


The recovery, once the RST comes back, could be to open a new connection, or something else in the application layer. (Think of an application with the same behavior as wget -c)

I think there is a significant quality difference between
1. The box reboots in 30 seconds. The first TCP retransmission after 30 seconds results in a RST coming back. This triggers application recovery.
2. The box reboots in 30 seconds. TCP retransmissions for the next 10 minutes are silently dropped because the retransmissions arrive at the peer's TCP with a bad checksum (due to "missing" shim6 rewrite).
Thus any RST doesn't arrive until 10 minutes or so later!



right, the only point that i wonder w.r.t. this is if it is wise to base the context loss detection procedure on the heuristics to establish shim context... i mean, the heuristics for establishing shim context may greatly vary, i guess. For instance, i think it may be a possibility that some heavy loaded servers use the policy to never initiate shim session establishment procedure, but they only accept establishment request from clients. In such a case, they wouldn't detect context loss.

So we can point out in our RFCs why such a behavior would be suboptimal.
Ensuring that one end can quickly detect when the peer has lost the context state, even when the state isn't used (when the ULIDs are used as locators), is far from inexpensive.


In any case, if the procedure you described in the previous mail for including both the context identifier and the nonce in a compact way in 20 bits, we could stuff all we need in all data packets, i guess, so if the flow label approach is used, all data packets of established shim sessions can be identified as such

No, that isn't sufficient, because there is nothing in a received packet which identifies it to the receiver (which has lost context) as a shim6 packet. Any packet can have a non-zero flow label, so that isn't a useful indication.


So you'd need to add at least one bit to every data packet to be able to do this.

ok, now i am confused
AFAIU until now, if we want to detect context loss based on the reception of data packets, we need to have some falg in the data packet that this packet belongs to a existent shim session. If we eliminate this bit, then we cannot detect loss context from data packets, right? or have i lost track of our reasoning?

If B receives a packet from A1 to B1 with flow label 12345 and a nexthdr of UDP, then if B has no context for <A1, B1, 12345>, how can B tell whether this is due to
- there never having been a shim6 context - A might not be shim6 capable for instance
- B having had such a context but has garbage collected too early


 Erik