[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: about reachability detection draft
El 14/08/2005, a las 21:57, Iljitsch van Beijnum escribió:
Ok, the long awaited reply:
On 16-jul-2005, at 17:49, marcelo bagnulo braun wrote:
In section 2 it is stated that:
- In the second model, a host can only detect problems in the
receiving
direction so it must depend on the correspondent to detect problems
in the other direction
[this is what I called forced bidirectional or FBD in the slides]
I think it is important to consider one step further then. I mean,
what can a host possibly do when he detects an outage in the incoming
path? If a host detects an outage in the outgoing path, he can change
the address pair that he is using to send packets and see if it
solves the problem, but what can he do if he detects an outage in the
incoming path? I guess that the only option would be to notify the
correspondent node about the failure so that the correspondent uses
an alternative path (Note that we are asuming the the case of
unidirectional connectivity is possible)
So i guess that this mode needs an additional message informing the
failure, which needs to be taken into account when comparing the
options.
Well, the idea is that when there is actually a failure in the active
path, it's likely that there will be failures in one or more of the
backup paths too. So in my opinion, it doesn't make sense to switch to
a new path blindly. So we must first test any path we may want to
switch to.
i fully agree with this... but i dont understand how this is related
with my comment above though
Now one of two things can happen:
1. the candidate path doesn't work either -> timeout, try another
2. the candidate path works -> we can communicate with the
correspondent
If there is any connectivity left, eventually we'll end up in
situation 2, and the fact that we're sending the correspondent these
test packets (which are different from the regular ones that we send
when we thing there is still connectivity, see below) tells the
correspondent that something is wrong, so the correspondent starts
sending test packets in our direction too.
let me see if i get this
Suppose that A is communicating with B
Suppose that we are using FBD so that a given frequency of packets
(data or just signaling) is guaranteed by the shim in each direction
Suppose now that A stops receiving packets. This implies a failure in
the B->A path.
Now are you assuming that when A stops receiving packets A should try
with alternative paths?
I am not sure this is good approach, since the path A->B may be working
properly.. moreover, maybe this path A->B is the only one working.
I mean, if A tries alternative paths when the B->A path fails,
basically we are assuming bidirectional connectivity, and we wouldn't
be considering the unidirectional connectivity case, right?
As i see the FBD mechanism would be the following:
A and B are communicating
they are using FBD
A stops receiving packets
So, A informs B that the B->A path has failed (this implies some form
of signaling from A to B, which is likely to be required to be somehow
reliable, which is why i see a potential difficulty here)
A continues using the path from A->B while B starts exploring
alternative paths
The reason I think we need two types of test packets is because in the
situation where there is a large number of address pairs with unknown
reachability and/or RTT, we need to do extra work to make sure that
when A sends (for instance) a probe A2 -> B3, which makes it to B, but
there is only connectivity from B to A over B1 -> A4, it's unlikely
that B1 -> A4 is the first pair B tries, so B needs to include reports
about what it got from A in all of its packets.
So you think that we need one packet type for periodic keepalives for
failure detection and another type of packet for exploring alternative
paths?
I am not sure about this since when two different unidirectional paths
are being used for a communication then the keepalive packet will need
to carry information about the locator pair used for incoming packets.
I mean, consider that a and Bare communicating, but they are using two
different unidirectional paths i.e. A->B is using locators A1 and B2,
while B->A is using locators A2 and B2
In this case when A sends a reachabilitity test request to B it will
use A1 and B1. However, when B replies, it will use B2 and A2, so i
guess it would a good idea to include in this reply packet information
about the address used for the request packet i.e. A1 and B1
And we need a reasonable level of authentication too, because we
haven't previously established that these addresses indeed belong to
the correspondent.
I am not sure what is the level of security we need here yet, but in
any case imho it will be very related to the security used during the
shim context establishment.
I mean, imho the critical part of the security if the addition of new
locators to the locator set. Once the locators have been validated,
using them shouldn't require to tight security requirements imho
On the other hand, when A1 <-> B1 is working happily, there is no need
to use such a complex protocol: we are only testing one pair in each
direction, and the correspondent has been authenticated earlier. If we
wanted we could even use pings to determine whether this still works.
(Well, sort of...)
But as i see it, we are only going to try locators that are included in
the locator set available for that shim context, which means that they
have already been validated....
I guess it would also make sense to state how the those mechanisms
behave when there are no outgoing packets? i mean, i guess that in
any of the modes signaling is suppressed right?, however, i guess
that the hosts assume that the address pair is reachable, right?
With correspondent unreachability detection (first mechanism in the
draft) the transport hints would accompany the packets, so packets =
no, hints = no -> no action. Only packets = yes and hints = no
requires reachability probes.
In the forced bidirectional communication we only send probes when we
sent data recently but didn't receive data recently, so no probes in
this case either.
Note that some transports and some applications explicitly keep the
session alive with periodic traffic.
right, i guess it would be good to include this explicitly this in the
draft
Additionally, i guess that there are other information that needs to
be taken into account when detecting reachability, such as ICMP error
messages, address deprecation, lower layers information (i know that
you state that you assume that addresses are available, but what
happens if an address of the currently used address pair is
deprecated? or if the associated interface goes down?)
It makes sense to send a probe when there is an ICMP error (have to
rate limit this, though).
ok
Deprecation is irrelevant, as we can continue to use deprecated
addresses.
Well, deprecation means that somewhere in the near future, the address
won't be available anymore, right?
So i guess it make sense to keep on using it as ULID, but i wonder if
it wouldn't be a good strategy to try to rehome the ongoing
communications to an alternative locator...?
Lower layer events are also a reason to do a reachability test, I
think.
ok
A more interesting case is when an address is removed from the
system. I don't remember which session it was, but in Paris someone
was talking about how systems keep using addresses they no longer have
because upper layers still use those addresses.
well, in the shim this address should be kept as a possible ULID but
not as a valid locator i guess
IMO this is a feature: if I unplug my ethernet from my powerbook and
turn on my wifi, I get the same address on a different interface and
my sessions are still alive. Under windows, things like this kill your
sessions immediately.
in this case i guess that this address should be:
- kept as a ULID during all the time
- should be removed from the locator set during the period which the
address was not available (i.e. during the time it took to remove it
from the ethernet and it is back in the wifi)
Another issue that may be of interest is what happens with an address
(pair) after it becomes unreachable? i mean, is it used in followings
address pair explorations? is it putted in quarantine?
The way I see it there is an ordered list of address pairs. The more
probes fail to make it to the other side the lower the address pair
ends up on the list, I imagine. :-)
but what happens when you only have 2 address paris for instance? you
may end up trying with the same two address pairs forever... I mean, i
guess we need a mechanism to give up trying, right?
Putting address pairs in quarantine would be one of such mechanisms,
but there may be others, like a maximum number of attempts...
We need to think about the situation where a fast primary link fails
and we switch to a slow backup, though. Presumably, we'll want to
switch back to the fast primary address pair when possible.
Right, the same arguments may apply for other reasons like cost of the
link, security/privacy features of the path... But the question is how
does the shim is aware of such information? i guess that these are
policy issues and we could include some means to express this type of
considerations in the shim
Regards, marcelo