[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: about reachability detection draft
El 19/08/2005, a las 15:23, Iljitsch van Beijnum escribió:
On 17-aug-2005, at 16:24, marcelo bagnulo braun wrote:
Suppose that A is communicating with B
Suppose that we are using FBD so that a given frequency of packets
(data or just signaling) is guaranteed by the shim in each direction
Suppose now that A stops receiving packets. This implies a failure in
the B->A path.
Yes. (Assuming that A is still sending, if not the session could
simply be idle.)
Now are you assuming that when A stops receiving packets A should try
with alternative paths?
I am not sure this is good approach, since the path A->B may be
working properly.. moreover, maybe this path A->B is the only one
working.
There are three possible actions:
1. Do a "lightweight" reachability test using the currently used
locator pairs for both directions.
2. Start a full reachability evaluation procedure that will look at
all locator pairs until it finds one that works.
3. Switch to a new address pair.
I think 3. is the wrong thing to do. Most importantly, because if one
pair fails it's very likely that at least some other pairs also fail,
so jumping to a new address pair blindly makes little sense. Also, in
order to be able to do this, we must have previously authenticated the
alternative address for the correspondent (or the correspondent for
our alternative address) to avoid redirection attacks. So this
basically means authenticating all possible addresses for both sides
when the shim state is created. That's a waste of resources.
1. may make sense, but if we know there should be packets flowing in
both directions but they're missing in one direction, it's very
unlikely that the current address pair is still working. We should
probably still test it, but I think we should also start looking at
the first secondary address pair because it's likely that the
currently used pair is dead. So that lands us at 2.
As i see the FBD mechanism would be the following:
A and B are communicating
they are using FBD
A stops receiving packets
So, A informs B that the B->A path has failed (this implies some form
of signaling from A to B, which is likely to be required to be
somehow reliable, which is why i see a potential difficulty here)
The way I see it, A just starts a path exploration procedure. This
means it will try address pairs until it gets an answer from B, so
unless the path from A to B is completely broken at some point B will
see a path exploration packet from A, which indicates to B that there
is probably something wrong, so B would start doing its own path
exploration.
but this is somehow weird, because A is performing a path exploration
procedure, even if the path that A is using to send packet is working
properly...
I mean, actually, in this scenario, you are using the path exploration
procedure initiated by A as a reliable mechanism to allow A to inform B
that something is wrong in the path that B is using to send packets, so
that B starts its own path exploration procedure, which is the one that
is the one that in fact is relevant, since is the B->A path which is
broken.
So, in fact the FBD procedure has the following steps, as i see it:
1- Both ends send keepalives to preserve the traffic frequency required.
2- when something is wrong, the receiving end detects the failure
3- Then the receiving end convey the information about the failure to
the sender (which the one that has to react). Such signaling must be
performed in a somehow reliable way (in particular, you can use the
path exploration procedure since it tries multiple paths)
4- The the sender can react, performing in its turn a path exploration
procedure.
agree?
i think it is important to identify that step 3 is required in order to
complete the failure detection procedure, since it is not only
important to detect the failure but also to inform the party that can
actually do something about it (i.e. the sender) that the failure has
occurred
If we use the other mechanism described in the draft (i don't remember
how did you call this in your presentation) the sender is the one that
actually detects the failure so there is no need for step 3
(FWIW I am not arguing for this latter mechanism or against FBD, just
trying to understand all the pieces that are required to make each one
work)
So you think that we need one packet type for periodic keepalives for
failure detection and another type of packet for exploring
alternative paths?
Yes. Assuming we go with the forced bidirectional thing, in the event
that there is traffic from A to B, but no traffic from B to A, B needs
to generate a packet anyway in order to let A know that everything is
still working. Since A presumably monitors all packet types, the
nature of this packet is of no importance. It could even be an IP
header without any payload.
But for the full reachability evaluation we need lots of extra stuff.
So it makes sense to have this in different packet types, especially
since the simple filler packets will probably be the most common.
ok i see now what you meant... (skipped some questions now answered by
your explanation)
...
And we need a reasonable level of authentication too, because we
haven't previously established that these addresses indeed belong to
the correspondent.
I am not sure what is the level of security we need here yet, but in
any case imho it will be very related to the security used during the
shim context establishment.
I mean, imho the critical part of the security if the addition of new
locators to the locator set. Once the locators have been validated,
using them shouldn't require to tight security requirements imho
Agree, but on the other hand we don't want to give attackers the
capability to make hosts think there are reachability problems,
especially attackers that aren't able to sniff any of the setup
traffic. So something simple like the TCP sequence number or the IPsec
replay counter would be useful here.
the attack that you have in mind then is that an attacker generates a
reachability test packet and the host reads this packets as the other
end starting the path exploration procedure, which implies that itself
should do the same because of a potential outage, right?
But this would be in the case of FBD right? i mean does this applies in
the other mechanism too? I think not, becuase in this case,
reachability tests are a two way exchange and wouldn't imply that the
other end starts doing its own reachability test, i think
In any case, you could also include some cookie generated during the
session establishment exchange to even make this stronger.
On the other hand, when A1 <-> B1 is working happily, there is no
need to use such a complex protocol: we are only testing one pair in
each direction, and the correspondent has been authenticated
earlier. If we wanted we could even use pings to determine whether
this still works. (Well, sort of...)
But as i see it, we are only going to try locators that are included
in the locator set available for that shim context, which means that
they have already been validated....
Since we need to test them for reachability right before we use them
anyway, I tink it makes more sense to defer this validation until that
time.
Deprecation is irrelevant, as we can continue to use deprecated
addresses.
Well, deprecation means that somewhere in the near future, the
address won't be available anymore, right?
Right, so we keep using it until it's not available anymore. :-)
ok, perhaps i am confused here so please correct me
Suppose that a prefix PrefA is being renumbered.
In order to remove this prefix, all addresses containing PrefA are
deprecated. This means that this prefix should not be used for
establishing new communications, but can be used for ongoing
communications, right?
Now a certain point in time the prefix is actually removed, this means
that packets containing a destination address with this prefix won't
reach the site any more. this would be that the prefix and the
associated addresses are no longer available according to your
terminology right? So my question is how does the host knows that the
prefix is no longer available? is there any other signaling tool
besides deprecation through RAdv?
So i guess it make sense to keep on using it as ULID, but i wonder if
it wouldn't be a good strategy to try to rehome the ongoing
communications to an alternative locator...?
Are there any bad side effects when using deprecated addresses?
A more interesting case is when an address is removed from the
system. I don't remember which session it was, but in Paris someone
was talking about how systems keep using addresses they no longer
have because upper layers still use those addresses.
well, in the shim this address should be kept as a possible ULID but
not as a valid locator i guess
Yes, it makes sense to keep addresses as ULIDs for a while after
they've become unavailable. On the other hand, in theory the address
could be assigned to someone else so we problably want to limit this.
On the other hand, if the address is an HBA address, presumably nobody
else would be able to use it (certainly not as a ULID, only possibly
as an RFC 3041 semi-collision), so there is no harm in keeping it
around as a ULID.
IMO this is a feature: if I unplug my ethernet from my powerbook
and turn on my wifi, I get the same address on a different interface
and my sessions are still alive. Under windows, things like this
kill your sessions immediately.
in this case i guess that this address should be:
- kept as a ULID during all the time
- should be removed from the locator set during the period which the
address was not available (i.e. during the time it took to remove it
from the ethernet and it is back in the wifi)
Maybe we should keep using the ULID as long as there is shim state
that refers to it, but it's removed when the shim state that uses the
ULID is removed.
agree
The way I see it there is an ordered list of address pairs. The more
probes fail to make it to the other side the lower the address pair
ends up on the list, I imagine. :-)
but what happens when you only have 2 address paris for instance? you
may end up trying with the same two address pairs forever... I mean,
i guess we need a mechanism to give up trying, right?
Hm, maybe, maybe not... How does this work for current IP?
I guess that while the shim is still trying to move the session to a
different address pair, we should block ICMP errors and such so the
application won't give up, but when the shim can't find a working
address pair, we should pass on the ICMP errors to the upper layer. At
the same time, I'm not sure if we should ever really give up as long
as upper layers keep sending traffic: connectivity can come back at
any time.
well, the shim could perhaps generate an internal ICMP error for
informing ULPs that no path is available
We need to think about the situation where a fast primary link fails
and we switch to a slow backup, though. Presumably, we'll want to
switch back to the fast primary address pair when possible.
Right, the same arguments may apply for other reasons like cost of
the link, security/privacy features of the path... But the question
is how does the shim is aware of such information? i guess that these
are policy issues and we could include some means to express this
type of considerations in the shim
Right. For destination addresses and to some degree, source addresses,
this can work through the RFC 3484 policy table, but it gets more
difficult when we have to select an output interface or ISP/site exit,
possibly though the source address in the packet.
BEsides, in order to return to the original path, the shim needs to the
probing it all the time, so that it can detect when the original path
is back up. I mean, this wouldn't normally occur for other paths, since
the shim won't be probing alternative paths while the current path is
working, so if we want this feature, the default behaviour needs to be
modified i guess.
regards, marcelo