[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: about reachability detection draft

To: marcelo bagnulo braun <marcelo@it.uc3m.es>
Subject: Re: about reachability detection draft
From: Iljitsch van Beijnum <iljitsch@muada.com>
Date: Sun, 14 Aug 2005 21:57:43 +0200
Cc: shim6 <shim6@psg.com>
In-reply-to: <1a19b8397546d199b67740ae5c539348@it.uc3m.es>
References: <1a19b8397546d199b67740ae5c539348@it.uc3m.es>

Ok, the long awaited reply:

On 16-jul-2005, at 17:49, marcelo bagnulo braun wrote:

In section 2 it is stated that:

- In the second model, a host can only detect problems in the receiving direction so it must depend on the correspondent to detect problems in the other direction


[this is what I called forced bidirectional or FBD in the slides]

I think it is important to consider one step further then. I mean, what can a host possibly do when he detects an outage in the incoming path? If a host detects an outage in the outgoing path, he can change the address pair that he is using to send packets and see if it solves the problem, but what can he do if he detects an outage in the incoming path? I guess that the only option would be to notify the correspondent node about the failure so that the correspondent uses an alternative path (Note that we are asuming the the case of unidirectional connectivity is possible) So i guess that this mode needs an additional message informing the failure, which needs to be taken into account when comparing the options.

Well, the idea is that when there is actually a failure in the active path, it's likely that there will be failures in one or more of the backup paths too. So in my opinion, it doesn't make sense to switch to a new path blindly. So we must first test any path we may want to switch to.

Now one of two things can happen:

1. the candidate path doesn't work either -> timeout, try another
2. the candidate path works -> we can communicate with the correspondent

If there is any connectivity left, eventually we'll end up in situation 2, and the fact that we're sending the correspondent these test packets (which are different from the regular ones that we send when we thing there is still connectivity, see below) tells the correspondent that something is wrong, so the correspondent starts sending test packets in our direction too.

The reason I think we need two types of test packets is because in the situation where there is a large number of address pairs with unknown reachability and/or RTT, we need to do extra work to make sure that when A sends (for instance) a probe A2 -> B3, which makes it to B, but there is only connectivity from B to A over B1 -> A4, it's unlikely that B1 -> A4 is the first pair B tries, so B needs to include reports about what it got from A in all of its packets. And we need a reasonable level of authentication too, because we haven't previously established that these addresses indeed belong to the correspondent.

On the other hand, when A1 <-> B1 is working happily, there is no need to use such a complex protocol: we are only testing one pair in each direction, and the correspondent has been authenticated earlier. If we wanted we could even use pings to determine whether this still works. (Well, sort of...)

I guess it would also make sense to state how the those mechanisms behave when there are no outgoing packets? i mean, i guess that in any of the modes signaling is suppressed right?, however, i guess that the hosts assume that the address pair is reachable, right?

With correspondent unreachability detection (first mechanism in the draft) the transport hints would accompany the packets, so packets = no, hints = no -> no action. Only packets = yes and hints = no requires reachability probes.

In the forced bidirectional communication we only send probes when we sent data recently but didn't receive data recently, so no probes in this case either.

Note that some transports and some applications explicitly keep the session alive with periodic traffic.

Additionally, i guess that there are other information that needs to be taken into account when detecting reachability, such as ICMP error messages, address deprecation, lower layers information (i know that you state that you assume that addresses are available, but what happens if an address of the currently used address pair is deprecated? or if the associated interface goes down?)

It makes sense to send a probe when there is an ICMP error (have to rate limit this, though). Deprecation is irrelevant, as we can continue to use deprecated addresses. Lower layer events are also a reason to do a reachability test, I think. A more interesting case is when an address is removed from the system. I don't remember which session it was, but in Paris someone was talking about how systems keep using addresses they no longer have because upper layers still use those addresses. IMO this is a feature: if I unplug my ethernet from my powerbook and turn on my wifi, I get the same address on a different interface and my sessions are still alive. Under windows, things like this kill your sessions immediately.

Another issue that may be of interest is what happens with an address (pair) after it becomes unreachable? i mean, is it used in followings address pair explorations? is it putted in quarantine?

The way I see it there is an ordered list of address pairs. The more probes fail to make it to the other side the lower the address pair ends up on the list, I imagine. :-)

We need to think about the situation where a fast primary link fails and we switch to a slow backup, though. Presumably, we'll want to switch back to the fast primary address pair when possible.

  In its essence, address pair exploration is very simple: just send
  probes using every possible address pair, wait for something to come
  back and possibly consider the round trip time.

I guess we agree that you are oversimplifying the issue here :-)


Well, you know me.  (-:

I mean, the complexity is not only due to the amount of probe packets that are needed, but also becuase of unidirectional connectivity. I think it is very important to express such difficulty. I mean, even with two address pairs, the problem can be quite complex, because replies need to carry information, not only about the particular incoming packet, but also from other previously received probes, in order to allow the transmitter to determine if previous probes have succesfully arrived. I think it is important to describe this problem and possible approaches to deal with it.

You're absolutely right. I think I said something about these subjects but it has to be fleshed out in detail.

Finally, in the security considerations section, i think that there is closely related problem that perhaps needs to be presented here that is flooding protection. I mean, the path exploration exchange can be used for identifying working address pairs but also for preventing that the shim can be used for flooding attacks. In order to enable the path exploration exchange to be used for this, you need to include some additional information in the exchange, some information that identifies the shim context, so that the receiver of a packet of the address pair exploration process can determine if this is one of its own established sessions that are being genuinely rehomed or if this is a flooding attack.


Yes.

Iljitsch

Follow-Ups:
- Re: about reachability detection draft
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>

References:
- about reachability detection draft
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>

Prev by Date: Re: failure detection
Next by Date: Re: shim-aware transports
Previous by thread: Re: about reachability detection draft
Next by thread: Re: about reachability detection draft
Index(es):
- Date
- Thread