...
I think that the protocol behaviour would be something like this. A communication is established between node A and node B Later on, a shim context is created between those two nodes. The parameters for that context are: ULIDs: IPA1 and IPB1 Locators: for IPA1 (IPA1,...,IPAn) for IPB1 (IPB1,...,IPBm)
And a context tag presumably.
right
Suppose that for some reason node B losses the shim context (and only the shim context, i.e. the application and transport state about ongoing communications is preserved)
I guess that at this point we have several scenarios to consider:
Scenario a): the communication between A and B is still using IPA1 and IPB1 as locators.
This scenario has two subcases:
Scenario a.1) The communication is bidirectional and e.g.
TCP is providing ack of the progress of the communication
this means that no periodic reachability test
nor any other shim signaling is being exchanged.
In this scenario, a lost of SHIM context would remain
undetected until there is a failure and node A detects it
and tries to explore alternative paths. This is so because
data packets will carry ULIDs and will be passed successfully
to the upper layers.
If we assume that B (as well as A) will have a heuristic to create shim6 contexts (e.g. based on having received 50 packets for a locator pair), then this heuristic might be trigger and cause B to try to establish a context with A, at which point in time A will see that it already has a context with B.
Once that there is a failure, then
reachability test packets won't be recognized as belonging
to any existent shim context and the problem can be detected.
Here you are already assuming that reachability test packets will not be recognized, i.e. presupposing a particular interaction between the state management and the test protocol.
Scenario a.2) The communication is unidirectional
In this case, periodic reachability test need to be
performed in order to verify that the path is still working
If the node B losses its shim state, it won't recongnize
the reachability test packets, and the lost of context can
be detected
Again, here you are presupposing a particular interaction.
right, same than above
Scenario b) the communication between A and B is using alternative locators.
In this case, when node B losses the context, data packets won't be properly delivered in node B, because it won't be properly demuxed.
At this point, the reachability test will be performed to verify the locator pair being used
If you are using alternate locators and the working locator pair is unidirectional, then it seems like you'd need to be able to re-discover that working unidirectional locator pair, before you can re-establish the context state on B.
Thus if A is sending using IPA1->IPB2 and B was replying using IPB1->IPA2, and B looses the context state, what do you do?
Seems like solving this case requires that the test protocol is not tied in with the state management.
IPA1->IPB1 is working IPB2->IPA2 is working and all the rest of paths are not working (in particular IPB1->IPA1 is not working and IPA2->IPB2 is not working)
So, the question is how the can establish a communication?
I don't know if i am missing something, but AFAICS, all the situations when the shim context is lost result in a reachability test exchange, and that is why i was wondering if it wouldn't make sense to define a "no-context" error message as a rply to a reachability test request packet.
That is one particular solution with strong coupling between the test protocol and the state management.
But don't we want to retain the possibility to test locator pairs for initial contact, i.e. before a context is established between the peers? And handle the above case of unidirectional locator pairs?
But i fail to understand how the node that has lost the state can identify that a data packet belongs to a non existent shim state....
By seeing that the <source locator, destination locator, context tag> doesn't match any existing context?
I suspect we want that capability for robustness in any case.
I mean, i guess that a first element that is relevant here is where are we going to carry the context tag.
If the context tag is carried in a extension header or dest option, then i can see that if a node receives an packet with one of those, can easily detect that there is no context associated. (note that in this case, the context loss is only detected in the case where the locators used for the communication differ from the ULIDs, i.e. the extension header dst option is included in the packet)
If the context tag is included in the flow label, then i don't see how a node that receives the data packet can determine that the packet is associated to a shim context that is no longer there. At this point, i gues that as you mentioned in a previous mail, the data packet would be silently discarded, right?
If the context tag is carried as a flow label, I still think we need a way to tell the receiver "this is a shim6 packet". For robustness reasons I think the fact that the packet needs shim6 processing should be explicit.
There has been proposals in multi6 which suggested doing this without making the packets larger by defining a set of new nexthdr values with meaning like
shim6+tcp
shim6+udp
...
shim6+esp
Not having that "shim6" bit when the flow label is used as a context tag can easily result in hard to diagnose errors. We might have errors due to some middlebox messing with the data packets (a TCP relay for instance), but that leaves the shim6 test packets alone. If the TCP relay doesn't preserve the flow label, then the packets would be dropped due to TCP checksum errors (since the ULID rewrite didn't happen), but the test protocol would say that everything is fine.
Regards, marcelo
I think that at this point is clear to me that if we define a no-context error message, this message should be defined as a reply to a packet that refers to that context and it should include enough information about this initial packet to verify that is a reply to that packet.
The no-context error message cannot be issued spontaneously by a node.
Agreed.
Erik