[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: failure detection



On Sun, 14 Aug 2005, Iljitsch van Beijnum wrote:

An implementation that uses upper layer advice would only initiate a probe/reply sequence when the upper layer is unhappy.

This kind of thing worries me. Assuming it will be possible to have the upper layer 'direct' shim6 is assuming:


- OS maintainers will be prepared fundamentally change their network
  stacks to support shim6

- Worse, assuming upper layers even /have/ a notion of being
  unhappy or happy.

I don't think designing shim6 to accomodate either of those assumptions is a good idea at this time, or anywhere in the near to mid term future.

It's a data-layer: Provide best-effort packet delivery in a low-order *bounded* magnitude of time. Bounded time is possibly more important than best-effort.

If there are multiple paths available, let shim6 do heartbeats of them all at startup and X interval thereafter, pick whichever one seems best for sending packets down; Keep a note of which other paths seem to be available; if a path starts failing, pick another within some bounded time.

Don't try get clever, cause the upper layer knows /more/ than shim6 does:

- It might have a list of addresses it wants to try, don't delay it

- At the /application/ or even *user* layer, there might be other
  alternatives, eg:
	- VoIP: if the users call to foo@multi-site doesnt initiate
	  the user might know they should try joe@other-site

It is *much* better for the data-layer to signal *up*:

	"sorry, it's unreachable"

as soon as it can, and let the transport or the application or maybe even the *user* get on with trying potential alternatives than for the data-layer to try aimlessly continue on for a significant period of time with complicated reachability tests.

Shim6 will have the *least* amount of information about where the user is trying to get packets to.

Adding overcomplicated multi-pathing reachability/failover to shim6, when it will be the *least* informed part of the stack, is daft. Failure *IS* an option, and it's preferable to signal failure up *sooner* rather than later.

(and again, I hope this WG will disabuse itself of the notion that extra signals /down/ from transports to shim6 can even be considered at this early stage in shim6's life).

regards,
--
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
You will be married within a year, and divorced within two.