[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Question about REAP state transition (draft-ietf-shim6-failure-detection-09)



|  -----Mensaje original-----
|  De: Iljitsch van Beijnum [mailto:iljitsch@muada.com]
|  Enviado el: miércoles, 06 de febrero de 2008 10:17
|  Para: Alberto García
|  CC: shim6@psg.com
|  Asunto: Re: Question about REAP state transition
(draft-ietf-shim6-failure-
|  detection-09)
|  
|  On 5 feb 2008, at 11:22, Alberto García wrote:
|  
|  > It is nice to clarify that information related to the received
|  > probes should
|  > be included in the probes sent in the InboundOK state.
|  
|  > However, I still think that these information MUST be included also if
|  > available in the Exploring state (and not optionally, in "MAY"-style).
|  
|  Well, I clarified that the only probes we copy back are the ones since
|  the last transition from Operational to Exploring, because otherwise
|  it's possible to copy back old probes that were received before the
|  failure happened. And since the reception of an inbound probe means
|  going from Exploring to InboundOK it's impossible to have any probes
|  to copy back in Exploring. Maybe it's useful to make copying back one
|  probe mandatory in Operational too, though.


Sorry cause I think I did not manage to explain precisely the problem (and
some mistakes such as "retransmission timer" arose). Lets try again:

Suppose to nodes A and B communicating. Only one unidirectional path is
available from A to B after a failure (lets name it Pab_ok), and only one
from B to A (Pba_ok). 
Both A and B are in Exploring.
B sends a probe using Pba_ok. Node A in Exploring state receives the Probe
Exploring through Pba_ok, so it moves to Inbound_OK. For the next Probes
node A sends (unfortunately, NOT through Pab_ok), A includes the information
about the valid locators for its incoming paths (Pba_ok). But since it is
not able to find a working path from A to B for some time, these probes
never arrive to B. So B is still in Exploring state
B starts testing other paths (no longer Pba_ok), but these paths are not
working. 
Some time after A is not receiving anymore data from B, the Send timer at A
expires. Then, A moves from Inbound_ok to the Exploring state, and stops
including the validity of Pba_ok in its probes [this is the item of the
discussion]
Both A and B are now in exploring.

(we change A and B role)
Now A sends a probe to B through Pab_ok, moving B to Inbound_ok. But B is
now exploring non-working paths... Then A tries new paths for its probes,
the Send timer in B expires so it goes back to exploring....
A and B back again in exploring...

(do this forever)
Two valid unidirectional paths existed, one from A to B, and other from B to
A. Both nodes send probes through them. But the protocol could not move both
ends to an Operational state, because they were not known at a given
timeframe by both nodes.

Do you think this is a problem?

|  
|  > In B, the Retransmission Timer of B expires because a valid path
|  > from A to B
|  > was not found,
|  
|  What do you mean by "retransmission timer"? There is no timer with
|  that name.
|  
|  Probes are sent at certain intervals without considering whether
|  they're retransmissions.
|  
|  > so B starts testing other paths that are not working.
|  
|  B keeps testing paths until it sees A is in state InboundOK.
|  
|  > Then, A
|  > stops receiving data from B, so the Send timer expires (I don't find
|  > any
|  > reason why all the possible paths should be explored in less than Send
|  > Timeout time, so A could not test all possible paths from A to B in
|  > this
|  > time).
|  
|  The Send Timeout is for determining when the probing starts. The
|  probing process does not depend on the Send Timeout.
|  
|  Because probing exponentially backs off, a good number of them are
|  sent in the first minute or so (I think 17, but it depends on the
|  exact values for the exponential backoff) but at some point, the
|  probing rate is only one per minute. In theory, this means you will
|  find any working address pair if you wait long enough. In practice,

This is what I'm questioning. My understanding is that currently it is not
enough to test all, but to be luck to find valid paths for both directions
within a given timeframe; otherwise, both ends don't agree in the paths even
they exist.

|  you don't really care anymore after 30 - 300 seconds, depending on
|  transport timers and user patience. This means the number of address
|  pairs shouldn't be more than 16 or so (4 addresses on each end).
|
|  > Then, A falls to the Exploring state, and (in the supposition of the
|  > previous paragraph) forgets about the working path from B to A. May
|  > be now A
|  > sends a probe to B through a working path. but in B happens the same
|  > (it
|  > tries now with different paths from B to A that are no valid, so A
|  > tries
|  > another paths from A to B abandoning the good one...).
|  
|  The inclusion of at least one probe that was received earlier in
|  outgoing probes should fix this: when you get a packet from the other
|  side, you know at least one working address pair from here to there.
|  Obviously this won't work if reachability changes in the interim.=