[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: about reachability detection draft

To: Iljitsch van Beijnum <iljitsch@muada.com>
Subject: Re: about reachability detection draft
From: marcelo bagnulo braun <marcelo@it.uc3m.es>
Date: Sat, 20 Aug 2005 07:34:41 +0200
Cc: shim6 <shim6@psg.com>
In-reply-to: <151DAC31-FB46-4310-B7B6-5C31F97011F6@muada.com>
References: <1a19b8397546d199b67740ae5c539348@it.uc3m.es> <12550561-34A9-4AD3-A194-9BD531E817F3@muada.com> <90331ee3e696e13c458b366302505d31@it.uc3m.es> <151DAC31-FB46-4310-B7B6-5C31F97011F6@muada.com>


El 19/08/2005, a las 15:23, Iljitsch van Beijnum escribió:

On 17-aug-2005, at 16:24, marcelo bagnulo braun wrote:
Suppose that A is communicating with B Suppose that we are using FBD so that a given frequency of packets (data or just signaling) is guaranteed by the shim in each direction

Suppose now that A stops receiving packets. This implies a failure in the B->A path.
Yes. (Assuming that A is still sending, if not the session could simply be idle.)

Now are you assuming that when A stops receiving packets A should try with alternative paths? I am not sure this is good approach, since the path A->B may be working properly.. moreover, maybe this path A->B is the only one working.
There are three possible actions:
1. Do a "lightweight" reachability test using the currently used locator pairs for both directions. 2. Start a full reachability evaluation procedure that will look at all locator pairs until it finds one that works. 3. Switch to a new address pair.

I think 3. is the wrong thing to do. Most importantly, because if one pair fails it's very likely that at least some other pairs also fail, so jumping to a new address pair blindly makes little sense. Also, in order to be able to do this, we must have previously authenticated the alternative address for the correspondent (or the correspondent for our alternative address) to avoid redirection attacks. So this basically means authenticating all possible addresses for both sides when the shim state is created. That's a waste of resources.

1. may make sense, but if we know there should be packets flowing in both directions but they're missing in one direction, it's very unlikely that the current address pair is still working. We should probably still test it, but I think we should also start looking at the first secondary address pair because it's likely that the currently used pair is dead. So that lands us at 2.

As i see the FBD mechanism would be the following:
A and B are communicating
they are using FBD
A stops receiving packets
So, A informs B that the B->A path has failed (this implies some form of signaling from A to B, which is likely to be required to be somehow reliable, which is why i see a potential difficulty here)
The way I see it, A just starts a path exploration procedure. This means it will try address pairs until it gets an answer from B, so unless the path from A to B is completely broken at some point B will see a path exploration packet from A, which indicates to B that there is probably something wrong, so B would start doing its own path exploration.

but this is somehow weird, because A is performing a path exploration procedure, even if the path that A is using to send packet is working properly...

I mean, actually, in this scenario, you are using the path exploration procedure initiated by A as a reliable mechanism to allow A to inform B that something is wrong in the path that B is using to send packets, so that B starts its own path exploration procedure, which is the one that is the one that in fact is relevant, since is the B->A path which is broken.

So, in fact the FBD procedure has the following steps, as i see it: 1- Both ends send keepalives to preserve the traffic frequency required. 2- when something is wrong, the receiving end detects the failure 3- Then the receiving end convey the information about the failure to the sender (which the one that has to react). Such signaling must be performed in a somehow reliable way (in particular, you can use the path exploration procedure since it tries multiple paths) 4- The the sender can react, performing in its turn a path exploration procedure.

agree?

i think it is important to identify that step 3 is required in order to complete the failure detection procedure, since it is not only important to detect the failure but also to inform the party that can actually do something about it (i.e. the sender) that the failure has occurred

If we use the other mechanism described in the draft (i don't remember how did you call this in your presentation) the sender is the one that actually detects the failure so there is no need for step 3

(FWIW I am not arguing for this latter mechanism or against FBD, just trying to understand all the pieces that are required to make each one work)

So you think that we need one packet type for periodic keepalives for failure detection and another type of packet for exploring alternative paths?
Yes. Assuming we go with the forced bidirectional thing, in the event that there is traffic from A to B, but no traffic from B to A, B needs to generate a packet anyway in order to let A know that everything is still working. Since A presumably monitors all packet types, the nature of this packet is of no importance. It could even be an IP header without any payload.
But for the full reachability evaluation we need lots of extra stuff.
So it makes sense to have this in different packet types, especially since the simple filler packets will probably be the most common.

ok i see now what you meant... (skipped some questions now answered by your explanation)

...

And we need a reasonable level of authentication too, because we haven't previously established that these addresses indeed belong to the correspondent.

I am not sure what is the level of security we need here yet, but in any case imho it will be very related to the security used during the shim context establishment.

I mean, imho the critical part of the security if the addition of new locators to the locator set. Once the locators have been validated, using them shouldn't require to tight security requirements imho
Agree, but on the other hand we don't want to give attackers the capability to make hosts think there are reachability problems, especially attackers that aren't able to sniff any of the setup traffic. So something simple like the TCP sequence number or the IPsec replay counter would be useful here.

the attack that you have in mind then is that an attacker generates a reachability test packet and the host reads this packets as the other end starting the path exploration procedure, which implies that itself should do the same because of a potential outage, right? But this would be in the case of FBD right? i mean does this applies in the other mechanism too? I think not, becuase in this case, reachability tests are a two way exchange and wouldn't imply that the other end starts doing its own reachability test, i think

In any case, you could also include some cookie generated during the session establishment exchange to even make this stronger.

On the other hand, when A1 <-> B1 is working happily, there is no need to use such a complex protocol: we are only testing one pair in each direction, and the correspondent has been authenticated earlier. If we wanted we could even use pings to determine whether this still works. (Well, sort of...)

But as i see it, we are only going to try locators that are included in the locator set available for that shim context, which means that they have already been validated....
Since we need to test them for reachability right before we use them anyway, I tink it makes more sense to defer this validation until that time.

Deprecation is irrelevant, as we can continue to use deprecated addresses.

Well, deprecation means that somewhere in the near future, the address won't be available anymore, right?
Right, so we keep using it until it's not available anymore.  :-)


ok, perhaps i am confused here so please correct me

Suppose that a prefix PrefA is being renumbered. In order to remove this prefix, all addresses containing PrefA are deprecated. This means that this prefix should not be used for establishing new communications, but can be used for ongoing communications, right?

Now a certain point in time the prefix is actually removed, this means that packets containing a destination address with this prefix won't reach the site any more. this would be that the prefix and the associated addresses are no longer available according to your terminology right? So my question is how does the host knows that the prefix is no longer available? is there any other signaling tool besides deprecation through RAdv?

So i guess it make sense to keep on using it as ULID, but i wonder if it wouldn't be a good strategy to try to rehome the ongoing communications to an alternative locator...?
Are there any bad side effects when using deprecated addresses?
A more interesting case is when an address is removed from the system. I don't remember which session it was, but in Paris someone was talking about how systems keep using addresses they no longer have because upper layers still use those addresses.

well, in the shim this address should be kept as a possible ULID but not as a valid locator i guess
Yes, it makes sense to keep addresses as ULIDs for a while after they've become unavailable. On the other hand, in theory the address could be assigned to someone else so we problably want to limit this.

On the other hand, if the address is an HBA address, presumably nobody else would be able to use it (certainly not as a ULID, only possibly as an RFC 3041 semi-collision), so there is no harm in keeping it around as a ULID.

IMO this is a feature: if I unplug my ethernet from my powerbook and turn on my wifi, I get the same address on a different interface and my sessions are still alive. Under windows, things like this kill your sessions immediately.

in this case i guess that this address should be: - kept as a ULID during all the time - should be removed from the locator set during the period which the address was not available (i.e. during the time it took to remove it from the ethernet and it is back in the wifi)
Maybe we should keep using the ULID as long as there is shim state that refers to it, but it's removed when the shim state that uses the ULID is removed.


agree

The way I see it there is an ordered list of address pairs. The more probes fail to make it to the other side the lower the address pair ends up on the list, I imagine. :-)

but what happens when you only have 2 address paris for instance? you may end up trying with the same two address pairs forever... I mean, i guess we need a mechanism to give up trying, right?
Hm, maybe, maybe not... How does this work for current IP?
I guess that while the shim is still trying to move the session to a different address pair, we should block ICMP errors and such so the application won't give up, but when the shim can't find a working address pair, we should pass on the ICMP errors to the upper layer. At the same time, I'm not sure if we should ever really give up as long as upper layers keep sending traffic: connectivity can come back at any time.

well, the shim could perhaps generate an internal ICMP error for informing ULPs that no path is available

We need to think about the situation where a fast primary link fails and we switch to a slow backup, though. Presumably, we'll want to switch back to the fast primary address pair when possible.

Right, the same arguments may apply for other reasons like cost of the link, security/privacy features of the path... But the question is how does the shim is aware of such information? i guess that these are policy issues and we could include some means to express this type of considerations in the shim
Right. For destination addresses and to some degree, source addresses, this can work through the RFC 3484 policy table, but it gets more difficult when we have to select an output interface or ISP/site exit, possibly though the source address in the packet.

BEsides, in order to return to the original path, the shim needs to the probing it all the time, so that it can detect when the original path is back up. I mean, this wouldn't normally occur for other paths, since the shim won't be probing alternative paths while the current path is working, so if we want this feature, the default behaviour needs to be modified i guess.

regards, marcelo

Follow-Ups:
- Re: about reachability detection draft
  - From: Iljitsch van Beijnum <iljitsch@muada.com>

References:
- about reachability detection draft
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: about reachability detection draft
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- Re: about reachability detection draft
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: about reachability detection draft
  - From: Iljitsch van Beijnum <iljitsch@muada.com>

Prev by Date: Re: shim-aware transports
Next by Date: Re: failure detection
Previous by thread: Re: about reachability detection draft
Next by thread: Re: about reachability detection draft
Index(es):
- Date
- Thread