[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: soft state (was Re: shim6 and bit errors in data packet headers




El 11/05/2005, a las 20:15, Erik Nordmark escribió:


...

I think that the protocol behaviour would be something like this.
A communication is established between node A and node B
Later on, a shim context is created between those two nodes.
The parameters for that context are:
  ULIDs: IPA1 and IPB1
  Locators: for IPA1 (IPA1,...,IPAn)
            for IPB1 (IPB1,...,IPBm)

And a context tag presumably.


right

Suppose that for some reason node B losses the shim context (and only the shim context, i.e. the application and transport state about ongoing communications is preserved)
I guess that at this point we have several scenarios to consider:
Scenario a): the communication between A and B is still using IPA1 and IPB1 as locators.
This scenario has two subcases:
Scenario a.1) The communication is bidirectional and e.g.
TCP is providing ack of the progress of the communication
this means that no periodic reachability test
nor any other shim signaling is being exchanged.
In this scenario, a lost of SHIM context would remain
undetected until there is a failure and node A detects it
and tries to explore alternative paths. This is so because
data packets will carry ULIDs and will be passed successfully
to the upper layers.

If we assume that B (as well as A) will have a heuristic to create shim6 contexts (e.g. based on having received 50 packets for a locator pair), then this heuristic might be trigger and cause B to try to establish a context with A, at which point in time A will see that it already has a context with B.



good point, this means that there is a possibility that the lost of context is detected also by receiving a context establishment request packet for an already available context....
i wonder how the receiver of that packets who already has a context associated with the communication would react...


I mean, is it possible to create two contexts between the same two nodes with the same two ULIDs (and perhaps a different context tag)? would this make any sense?

If the answer is no, then i guess that the receiver can detect the situation because it already has a context with the same two ULIDs. At this point, the receiver can go along with the context establishment exchange in order to help the peer to re-create the lost context. Perhaps there may be some issues with how to assign the context tags, though...

Once that there is a failure, then
reachability test packets won't be recognized as belonging
to any existent shim context and the problem can be detected.

Here you are already assuming that reachability test packets will not be recognized, i.e. presupposing a particular interaction between the state management and the test protocol.

Well, i guess that what i am assuming is that a shim enable context will verify that received reachability test packets corresponds to an existent context, i.e. that upon the reception of a reahcability test, the node will verify that this corresponds to an exsitent context. If it does, it will reply, if not it won't


I don't know if you consider that i am assuming something else... (please let me know)

The above assumption is based in that i think this is needed to properly protect from flooding attacks.
I mean, the goal of reachability tests in a shim protocol can be two-folded:
- to explore if a given address pair is working
- to determine if a given host is willing to receive packets at a given address (i.e. to prevent flooding attacks)


the first goal can be achieved with a kind of ping (enhanced ping in order to determine unidirectional reachability) but it may not require that the reachability test is associated with a given context. I mean, the goal here is just to obtain reachability information

The second goal is somehow different, and since what is being queried is the willingness of the node to receive traffic through a certain address, it is needed to inform the node which traffic are we talking about. If the reachability test request packet does not contain any information about the context that is associated, then i thin that it would not fulfill the second goal.
This is so because an attacker could then flood any shim capable node, since any shim capable node would be able to reply reachability tests packets, no matter if they refer to existent contexts or not. Make sense?


So, this is why i think it makes sense to assume that a reachability test packet needs to contain information about the context that it refers, and that a node that has lost its context state won't recognize a reachability test packet of the lost context.


Scenario a.2) The communication is unidirectional
In this case, periodic reachability test need to be
performed in order to verify that the path is still working
If the node B losses its shim state, it won't recongnize
the reachability test packets, and the lost of context can
be detected

Again, here you are presupposing a particular interaction.

right, same than above


Scenario b) the communication between A and B is using alternative locators.
In this case, when node B losses the context, data packets won't be properly delivered in node B, because it won't be properly demuxed.
At this point, the reachability test will be performed to verify the locator pair being used

If you are using alternate locators and the working locator pair is unidirectional, then it seems like you'd need to be able to re-discover that working unidirectional locator pair, before you can re-establish the context state on B.
Thus if A is sending using IPA1->IPB2 and B was replying using IPB1->IPA2, and B looses the context state, what do you do?
Seems like solving this case requires that the test protocol is not tied in with the state management.



:-)
Context state AND path failure AND unidirectional connectivity.... this seems amusing enough


Ok, but let's first discuss how do we deal with this situation when establishing the initial context, so we can then move to this scenario after a context loss event, agree?

So, suppose that we have node A with (IPA1,IPA2) and node B with (IPB1 and IPB2)

suppose that only one unidirectional path is available in each direction, let's say for example that

IPA1->IPB1 is working
IPB2->IPA2 is working
and all the rest of paths are not working
(in particular IPB1->IPA1 is not working and IPA2->IPB2 is not working)

So, the question is how the can establish a communication?

Clearly retrying with different address pairs won't work, since they need to use alternative address pairs

The other option would be to try to establish the shim context simultaneously with the beginning of the communication, so that the shim can be used since the beginning. This would allow to use different address pairs in each direction since the beginning.

The problem here is how we can design a context establishment exchange that can support this scenario. I mean, in the functional decomposition draft, the context establishment exchange is a 4 way handshake designed to avoid DoS attack, in which the receiver does not create any state upon the reception of the first packet. So far this design does not seems to support the considered scenario where there is unidirectional connectivity.
I guess that it would be possible to include in the first packet, an alternative locator to be used when sending packets back to the initiator.
I mean, consider the scenario described above.
Node A is initiating using IPA1 and IPB1 as locators for the context establishment request packet.
In this packet, i guess it could include an alternative locator for node A, IPA2, so that node B sends the reply to IPA2 (another issue that needs to be considered would be security checks that are needed for this and how much more expensive is the context establishment because of this verification).
The question now is how do we manage that the reply also uses an alternative locator for node B? I mean, if we still want that node B does not stores any state about this context upon the reception of this first packet, this may be difficult


My point is that i am not sure we can support this type of scenarios for the establishment of the context, so i would propose to analyze this case first and then move on to see if we can deal with the case of lost state and unidirectional connectivity.


I don't know if i am missing something, but AFAICS, all the situations when the shim context is lost result in a reachability test exchange, and that is why i was wondering if it wouldn't make sense to define a "no-context" error message as a rply to a reachability test request packet.

That is one particular solution with strong coupling between the test protocol and the state management.


But don't we want to retain the possibility to test locator pairs for initial contact, i.e. before a context is established between the peers? And handle the above case of unidirectional locator pairs?

right, i guess we should explore how this would work and see how hard this is...


i deduce from your statement that you are considering the possibility to perform reachability tests that are not bound to any context, so it can be used for initial contact, right?

I guess that we would need to design two different reachability tests, one for reachability and one for flood prevention, right?




But i fail to understand how the node that has lost the state can identify that a data packet belongs to a non existent shim state....

By seeing that the <source locator, destination locator, context tag> doesn't match any existing context?

in the flow label case, this is not enough, as you point out below, you also need a shim bit to indicate that this is shim data packet, right?


I suspect we want that capability for robustness in any case.

I mean, i guess that a first element that is relevant here is where are we going to carry the context tag.
If the context tag is carried in a extension header or dest option, then i can see that if a node receives an packet with one of those, can easily detect that there is no context associated. (note that in this case, the context loss is only detected in the case where the locators used for the communication differ from the ULIDs, i.e. the extension header dst option is included in the packet)
If the context tag is included in the flow label, then i don't see how a node that receives the data packet can determine that the packet is associated to a shim context that is no longer there. At this point, i gues that as you mentioned in a previous mail, the data packet would be silently discarded, right?

If the context tag is carried as a flow label, I still think we need a way to tell the receiver "this is a shim6 packet". For robustness reasons I think the fact that the packet needs shim6 processing should be explicit.
There has been proposals in multi6 which suggested doing this without making the packets larger by defining a set of new nexthdr values with meaning like
shim6+tcp
shim6+udp
...
shim6+esp


Not having that "shim6" bit when the flow label is used as a context tag can easily result in hard to diagnose errors. We might have errors due to some middlebox messing with the data packets (a TCP relay for instance), but that leaves the shim6 test packets alone. If the TCP relay doesn't preserve the flow label, then the packets would be dropped due to TCP checksum errors (since the ULID rewrite didn't happen), but the test protocol would say that everything is fine.


well, i may agree here, but this should be considered when choosing where to carry the context tag. So far, in the discussion, and in the arch draft, there where no shim bit in data packets, and this point was not considered in the discussion of the trade offs.


In any case, i agree that it seems that the loss of the shim context can be detected when receiving packets other than reachability tests, i guess that it may also be detected when receiving any shim signaling i.e. data packets with context tag (that are explicitly signaled as context tags) context establishment request packets and reachability test packets.

Regards, marcelo



I think that at this point is clear to me that if we define a no-context error message, this message should be defined as a reply to a packet that refers to that context and it should include enough information about this initial packet to verify that is a reply to that packet.
The no-context error message cannot be issued spontaneously by a node.

Agreed.

  Erik