[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: address pair exploration, flooding and state loss




El 27/05/2005, a las 22:57, Erik Nordmark escribió:

marcelo bagnulo braun wrote:

In #1 it is key that we can get A to realize that the context has been lost on B, so that we can get to the point where the shim on B can pass up the packets to the ULPs. This will cause the ULPs to generate errors (e.g. TCP RST) and we are back to what we have today without a shim.

besides at this point, there is no possible recovery, since ULP state also has been lost

Yes, but "do no harm" presumably includes "don't slow down current failure detection".


The recovery, once the RST comes back, could be to open a new connection, or something else in the application layer. (Think of an application with the same behavior as wget -c)

I think there is a significant quality difference between
1. The box reboots in 30 seconds. The first TCP retransmission after 30 seconds results in a RST coming back. This triggers application recovery.
2. The box reboots in 30 seconds. TCP retransmissions for the next 10 minutes are silently dropped because the retransmissions arrive at the peer's TCP with a bad checksum (due to "missing" shim6 rewrite).
Thus any RST doesn't arrive until 10 minutes or so later!


agree
so the conclusion is that reboots and lost state have to be detected at least as fast as what is available today





right, the only point that i wonder w.r.t. this is if it is wise to base the context loss detection procedure on the heuristics to establish shim context... i mean, the heuristics for establishing shim context may greatly vary, i guess. For instance, i think it may be a possibility that some heavy loaded servers use the policy to never initiate shim session establishment procedure, but they only accept establishment request from clients. In such a case, they wouldn't detect context loss.

So we can point out in our RFCs why such a behavior would be suboptimal.
Ensuring that one end can quickly detect when the peer has lost the context state, even when the state isn't used (when the ULIDs are used as locators), is far from inexpensive.

may be but it does constraint the possible heuristics used for establishing shim sessions.
Maybe it would be enough to mention that the heuristics for establishing shim sessions also are used for recover from those situations where the context has been lost. However, i guess that need to take into account where the heuristics for establsihing shim sessions are not useful for recovering from lost state and provide a worst case recovery for this case.



In any case, if the procedure you described in the previous mail for including both the context identifier and the nonce in a compact way in 20 bits, we could stuff all we need in all data packets, i guess, so if the flow label approach is used, all data packets of established shim sessions can be identified as such

No, that isn't sufficient, because there is nothing in a received packet which identifies it to the receiver (which has lost context) as a shim6 packet. Any packet can have a non-zero flow label, so that isn't a useful indication.


So you'd need to add at least one bit to every data packet to be able to do this.

ok, now i am confused
AFAIU until now, if we want to detect context loss based on the reception of data packets, we need to have some falg in the data packet that this packet belongs to a existent shim session. If we eliminate this bit, then we cannot detect loss context from data packets, right? or have i lost track of our reasoning?

If B receives a packet from A1 to B1 with flow label 12345 and a nexthdr of UDP, then if B has no context for <A1, B1, 12345>, how can B tell whether this is due to
- there never having been a shim6 context - A might not be shim6 capable for instance
- B having had such a context but has garbage collected too early



right, this is what i have in mind.

so, so far what we have is:

- before a rehoming event, packets may not be identified as belonging to a shim session. If this is the case, the data packets associated to this sessions are not useful to detect context loss, so alternative mechanims, like using the heuristics for establishing shim sessions are used for recover from this situations
- after a rehoming event, context loss is detected upon the reception of any packet associated with the shim session, whether signalling packet or data packet. for that, data packets need to carry at least one bit that identifies them as belonging to a shim session.


right?

regards, marcelo


 Erik