[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: address pair exploration, flooding and state loss
marcelo bagnulo braun wrote:
In #1 it is key that we can get A to realize that the context has been
lost on B, so that we can get to the point where the shim on B can
pass up the packets to the ULPs. This will cause the ULPs to generate
errors (e.g. TCP RST) and we are back to what we have today without a
shim.
besides at this point, there is no possible recovery, since ULP state
also has been lost
Yes, but "do no harm" presumably includes "don't slow down current
failure detection".
The recovery, once the RST comes back, could be to open a new
connection, or something else in the application layer. (Think of an
application with the same behavior as wget -c)
I think there is a significant quality difference between
1. The box reboots in 30 seconds. The first TCP retransmission after 30
seconds results in a RST coming back. This triggers application recovery.
2. The box reboots in 30 seconds. TCP retransmissions for the next 10
minutes are silently dropped because the retransmissions arrive at the
peer's TCP with a bad checksum (due to "missing" shim6 rewrite).
Thus any RST doesn't arrive until 10 minutes or so later!
right, the only point that i wonder w.r.t. this is if it is wise to base
the context loss detection procedure on the heuristics to establish shim
context... i mean, the heuristics for establishing shim context may
greatly vary, i guess. For instance, i think it may be a possibility
that some heavy loaded servers use the policy to never initiate shim
session establishment procedure, but they only accept establishment
request from clients. In such a case, they wouldn't detect context loss.
So we can point out in our RFCs why such a behavior would be suboptimal.
Ensuring that one end can quickly detect when the peer has lost the
context state, even when the state isn't used (when the ULIDs are used
as locators), is far from inexpensive.
In any case, if the procedure you described in the previous mail for
including both the context identifier and the nonce in a compact way in
20 bits, we could stuff all we need in all data packets, i guess, so if
the flow label approach is used, all data packets of established shim
sessions can be identified as such
No, that isn't sufficient, because there is nothing in a received packet
which identifies it to the receiver (which has lost context) as a shim6
packet. Any packet can have a non-zero flow label, so that isn't a
useful indication.
So you'd need to add at least one bit to every data packet to be able to
do this.
ok, now i am confused
AFAIU until now, if we want to detect context loss based on the
reception of data packets, we need to have some falg in the data packet
that this packet belongs to a existent shim session. If we eliminate
this bit, then we cannot detect loss context from data packets, right?
or have i lost track of our reasoning?
If B receives a packet from A1 to B1 with flow label 12345 and a nexthdr
of UDP, then if B has no context for <A1, B1, 12345>, how can B tell
whether this is due to
- there never having been a shim6 context - A might not be shim6
capable for instance
- B having had such a context but has garbage collected too early
Erik