[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: about reachability detection draft




El 14/08/2005, a las 21:57, Iljitsch van Beijnum escribió:

Ok, the long awaited reply:

On 16-jul-2005, at 17:49, marcelo bagnulo braun wrote:

In section 2 it is stated that:

- In the second model, a host can only detect problems in the receiving
direction so it must depend on the correspondent to detect problems
in the other direction

[this is what I called forced bidirectional or FBD in the slides]

I think it is important to consider one step further then. I mean, what can a host possibly do when he detects an outage in the incoming path? If a host detects an outage in the outgoing path, he can change the address pair that he is using to send packets and see if it solves the problem, but what can he do if he detects an outage in the incoming path? I guess that the only option would be to notify the correspondent node about the failure so that the correspondent uses an alternative path (Note that we are asuming the the case of unidirectional connectivity is possible)
So i guess that this mode needs an additional message informing the failure, which needs to be taken into account when comparing the options.

Well, the idea is that when there is actually a failure in the active path, it's likely that there will be failures in one or more of the backup paths too. So in my opinion, it doesn't make sense to switch to a new path blindly. So we must first test any path we may want to switch to.

i fully agree with this... but i dont understand how this is related with my comment above though



Now one of two things can happen:

1. the candidate path doesn't work either -> timeout, try another
2. the candidate path works -> we can communicate with the correspondent


If there is any connectivity left, eventually we'll end up in situation 2, and the fact that we're sending the correspondent these test packets (which are different from the regular ones that we send when we thing there is still connectivity, see below) tells the correspondent that something is wrong, so the correspondent starts sending test packets in our direction too.


let me see if i get this
Suppose that A is communicating with B
Suppose that we are using FBD so that a given frequency of packets (data or just signaling) is guaranteed by the shim in each direction


Suppose now that A stops receiving packets. This implies a failure in the B->A path.

Now are you assuming that when A stops receiving packets A should try with alternative paths?
I am not sure this is good approach, since the path A->B may be working properly.. moreover, maybe this path A->B is the only one working.


I mean, if A tries alternative paths when the B->A path fails, basically we are assuming bidirectional connectivity, and we wouldn't be considering the unidirectional connectivity case, right?

As i see the FBD mechanism would be the following:

A and B are communicating
they are using FBD
A stops receiving packets

So, A informs B that the B->A path has failed (this implies some form of signaling from A to B, which is likely to be required to be somehow reliable, which is why i see a potential difficulty here)

A continues using the path from A->B while B starts exploring alternative paths

The reason I think we need two types of test packets is because in the situation where there is a large number of address pairs with unknown reachability and/or RTT, we need to do extra work to make sure that when A sends (for instance) a probe A2 -> B3, which makes it to B, but there is only connectivity from B to A over B1 -> A4, it's unlikely that B1 -> A4 is the first pair B tries, so B needs to include reports about what it got from A in all of its packets.

So you think that we need one packet type for periodic keepalives for failure detection and another type of packet for exploring alternative paths?


I am not sure about this since when two different unidirectional paths are being used for a communication then the keepalive packet will need to carry information about the locator pair used for incoming packets.

I mean, consider that a and Bare communicating, but they are using two different unidirectional paths i.e. A->B is using locators A1 and B2, while B->A is using locators A2 and B2

In this case when A sends a reachabilitity test request to B it will use A1 and B1. However, when B replies, it will use B2 and A2, so i guess it would a good idea to include in this reply packet information about the address used for the request packet i.e. A1 and B1

And we need a reasonable level of authentication too, because we haven't previously established that these addresses indeed belong to the correspondent.

I am not sure what is the level of security we need here yet, but in any case imho it will be very related to the security used during the shim context establishment.


I mean, imho the critical part of the security if the addition of new locators to the locator set. Once the locators have been validated, using them shouldn't require to tight security requirements imho


On the other hand, when A1 <-> B1 is working happily, there is no need to use such a complex protocol: we are only testing one pair in each direction, and the correspondent has been authenticated earlier. If we wanted we could even use pings to determine whether this still works. (Well, sort of...)



But as i see it, we are only going to try locators that are included in the locator set available for that shim context, which means that they have already been validated....


I guess it would also make sense to state how the those mechanisms behave when there are no outgoing packets? i mean, i guess that in any of the modes signaling is suppressed right?, however, i guess that the hosts assume that the address pair is reachable, right?

With correspondent unreachability detection (first mechanism in the draft) the transport hints would accompany the packets, so packets = no, hints = no -> no action. Only packets = yes and hints = no requires reachability probes.


In the forced bidirectional communication we only send probes when we sent data recently but didn't receive data recently, so no probes in this case either.

Note that some transports and some applications explicitly keep the session alive with periodic traffic.


right, i guess it would be good to include this explicitly this in the draft


Additionally, i guess that there are other information that needs to be taken into account when detecting reachability, such as ICMP error messages, address deprecation, lower layers information (i know that you state that you assume that addresses are available, but what happens if an address of the currently used address pair is deprecated? or if the associated interface goes down?)

It makes sense to send a probe when there is an ICMP error (have to rate limit this, though).

ok

Deprecation is irrelevant, as we can continue to use deprecated addresses.

Well, deprecation means that somewhere in the near future, the address won't be available anymore, right?
So i guess it make sense to keep on using it as ULID, but i wonder if it wouldn't be a good strategy to try to rehome the ongoing communications to an alternative locator...?


Lower layer events are also a reason to do a reachability test, I think.

ok

A more interesting case is when an address is removed from the system. I don't remember which session it was, but in Paris someone was talking about how systems keep using addresses they no longer have because upper layers still use those addresses.

well, in the shim this address should be kept as a possible ULID but not as a valid locator i guess


IMO this is a feature: if I unplug my ethernet from my powerbook and turn on my wifi, I get the same address on a different interface and my sessions are still alive. Under windows, things like this kill your sessions immediately.


in this case i guess that this address should be:
- kept as a ULID during all the time
- should be removed from the locator set during the period which the address was not available (i.e. during the time it took to remove it from the ethernet and it is back in the wifi)


Another issue that may be of interest is what happens with an address (pair) after it becomes unreachable? i mean, is it used in followings address pair explorations? is it putted in quarantine?

The way I see it there is an ordered list of address pairs. The more probes fail to make it to the other side the lower the address pair ends up on the list, I imagine. :-)

but what happens when you only have 2 address paris for instance? you may end up trying with the same two address pairs forever... I mean, i guess we need a mechanism to give up trying, right?


Putting address pairs in quarantine would be one of such mechanisms, but there may be others, like a maximum number of attempts...


We need to think about the situation where a fast primary link fails and we switch to a slow backup, though. Presumably, we'll want to switch back to the fast primary address pair when possible.



Right, the same arguments may apply for other reasons like cost of the link, security/privacy features of the path... But the question is how does the shim is aware of such information? i guess that these are policy issues and we could include some means to express this type of considerations in the shim


Regards, marcelo