[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comments on draft-ietf-shim6-failure-detection




I would therefore argue that the important issue is not action
in multiple layers, but rather the avoidance of race conditions; a well-defined communication mechanism between the IP and transport/application
layer can help with this.
Hmm. Yes. "Link Up" ... "Link Temporary Problem" (here shim
is exploring for other alternatives)... "Link Up"...

  But it is less clear which protocol(s) should discover end-to-end
  connectivity problems or recover from them.  One answer is that this
  is clearly within the domain of multihoming protocol.  By performing
  testing and failure detection of the used path and switching to a new
  path if necessary, the transport and application protocols can work
  unchanged.

I am not clear that the "multi-homing protocol" necessarily has the right
information to do testing and failure detection correctly. For example, it does not make sense to diagnose a "connectivity problem" on a time scale less than RTO.
Yes, very small timescales would indeed be problematic.
Having said that a lot of the discussion around this is
centered around the path failures. I see path failure recovery
as a necessary component, but I would argue that local
failures (such as the little green light in the interface
card going blank) are likely to be more common. They are also
treated in a very different way. You KNOW you have a
problem and often have a good idea also about what other
things might be working (e.g., the interface whose green
light is still on). In local failures, even sub-second time
scale is achievable (depending on, of course, how fast
the green light reacts).

But for the rest, we can only do operations that are relatively
slow. My view of what's achievable is somewhere between
RTO and the time TCP gives up. Interestingly, on this timescale
TCP has probably already slowed down.

  One can also envision that applications would be able to tell the IP
  or transport layer that the current connection is unsatisfactory and
  an exploration for a better one would be desirable.  This would
  require an API to be developed, however.

The application layer does have the ability to diagnose connectivity problems on the order of seconds, through keep-alives. The IP layer generally does not have the ability to detect whether a connection is "satisfactory" since it does not have access to the TCB, only knowledge of potential causes of connectivity problems (such as path changes or missing routes), which it can provide to the transport layer or to applications.

We do have direct information in some cases, see above.
But in general... I think we should stay away from
trying to defining "satisfactory", and simply work on
a binary model where there's either connectivity or
there isn't.

--Jari