[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: failure detection

To: Paul Jakma <paul@clubi.ie>
Subject: Re: failure detection
From: marcelo bagnulo braun <marcelo@it.uc3m.es>
Date: Thu, 18 Aug 2005 16:00:30 +0200
Cc: shim6 <shim6@psg.com>
In-reply-to: <Pine.LNX.4.63.0508181034240.5291@sheen.jakma.org>
References: <8622E6A4-B0D7-4C9B-B184-8EB2A7C2738E@muada.com> <Pine.LNX.4.63.0508141523170.7023@sheen.jakma.org> <efebcb5728efd81901d5357b3993b6db@it.uc3m.es> <Pine.LNX.4.63.0508171556080.5353@sheen.jakma.org> <efa6464a563345cc24542d6ab48f3538@it.uc3m.es> <Pine.LNX.4.63.0508171932550.5353@sheen.jakma.org> <0f13bcc353755a4b9b965267a6a7ffb1@it.uc3m.es> <Pine.LNX.4.63.0508181034240.5291@sheen.jakma.org>

Hi Paul,

My understanding is that you seem to be uncomfortable with the fact that n^2 probes may be needed when we try with different source and destination locators, right?

Ok, i am uncomfortable too with this, but i fail to see any other option for dealing with this. Let me explain why:

Suppose you have a multihomed site with ISPA and ISPB and that they have assigned PrefA and PrefB respectivelly

Suppose you have Host1 in the multihomed host and that it is communicating with Host2 outside the multihomed site. For that communication, Host1 is using address PrefA:host1 both as locator and as ULID. I assume that host2 has a single address host1

Now, suppose that an outage in the link between the multihomed site and ISPA occurs. Host1 detects and needs to do something about it, how can he try with an alternative path? Well, it needs to retry using an alternative source address, so that packets can be routed through ISPB (in the outgoing direction this is due to ingress filtering compatibility and in the incoming direction becuase of the usage of PA addressing)

This implies that when a host within a multihomed site needs to try alternative paths, it needs to use different source addreses, and of course different destiantion addresses in a more general scenario.

This implies that in order to explore all the possible paths, we need to make n^2 probes.

Now, it is important to realize that n^2 is just an upper bound, and that n^2 probes will only be performed when all paths have failed except one and this is the last one you have tried with (which may occur very often according to Murphy's law :-)

One of the main concerns of the people designing this mechanisms is how to achieve clever mechanisms to reduce as much as possible the number of probes. The idea is not to send the n^2 probes at once, but to perform some form of exploration phases in which different combinations are tried.

Ok, i will move on to some comments you have below...

El 18/08/2005, a las 12:14, Paul Jakma escribió:

1. Why this is a compelling argument given that it's been possible to publish multiple addresses in DNS for a long long time, yet there has been 0 demand for either applications to implement n^2 path-probing of each local address to every remote address, or for OSes to implement some kind of 'path-probe' shim to provide such functionality for all applications?
I am afraid you are missing our goal here. this is not a matter of oportunity but the way we can preserve established communication through outages. see above
But you can preserve comms without n^2 probing.


sure, n^2 is just the worst case

With the shim, path are closely related to addresses used, in particular exit paths of the multihomed site are related to source addresses used.
Agreed.
So in order to provide this type of features, source address selection has to be influenced, for instance using RFC 3484 policy table
So in order to determine the source, you're saying the table which affects source selection has to be influenced by shim6.

no, i am saying that source address selection is influenced by RFC3484 policy table and that this table is the right place to express policy. In addition, that SHIM can honor this table as much as possible, so that policy can be expressed when using the shim

Surely that's like saying "shim6 will use whatever source pleases it" (ie simply ignoring SAS for the final shim6 output packet).

no, shim can try to use first the addresses as expressed in the policy table

Ie: The best way to honour local policy is to use INADDR_ANY and let the OS decide the source address by consulting local routing policy - alternatively, an administratively specified address. Why exactly is shim6 so different from everything else on the internet and special that this would not work for it?
this is exactly how the shim would support policing see above
No, you said shim6 likely will have to influence SAS. Thats quite different from not worrying about SAS at all and letting the OS decide according to its local policy.

No, shim will try to honor the policy table (or any other tool to express policy we need to define) Obviously, if the path preffered by the policy is not available, then shim will have to use others of course

3. The traditional way on the internet to guard against path failures is to get a routing feed (and no, that does *not* imply you advertise anything), why is shim6 so special that it can't defer to existing practice?
scalability. traditional IPv4 routing based multihoming lacks of it
Note again "does *not* imply you advertise anything". Everyone is (sort of) agreed "multihome by advertising your prefix" doesn't scale and should not be considered for IPv6 - hence shim6.

I'm saying you can still get a "read-only" routing feed (BGP or whatever), purely for informational purposes, to help decide which of your ISPs has the best path.

That's entirely scaleable and well within reason for deployment at 'enterprise' shim6 sites.


agree, but this is not enough to preserve established communications

Are you considering the case that full BGP feed is injected to the hosts, so that the hosts can find out which path is available towards its final destination?

4. How will you decide which path is best?

local policy can be expressed to some degree with the policy table defined in RFC 3484. If more fine grained expression is needed (e.g. per app) additional parameters need to be included in the policy table
Sorry, my question was more about the metrics you will use to determine whether path (local A, remote B) is better than (local C, remote D). Eg, RTT, packet loss, etc.


this is an ongoing discussinon in this list right now

we seem to be assuming that multihoming support is something useful and that it will be needed in IPv6.
I'd agree with that assumption. :)
This multihoming support seems to require communications to be preserved through outages.
Agreed, but I still don't understand why this requires shim6 to specify overriding existing SAS policy and do n^2 probing.


i addressed this point above

6. If path-probing really is desired, explain why this is shim6 specific?
path exploration is a fundamental part of the shim protocol-. maybe is not shim specific and ideas from other similar protocols can be used, but it is imho a key part of the shim protocol and need to be part of it.
If it is not specific to shim6, why should it be solved in shim6? (In the 'for every possible path of the combinations of local and remote addresses')

when i said that is not specific i meant that other protocols may require similar procedures, but it is a key component of the shim protocol, so that is why it needs to be performed.

...

Note that it's the probing using every single local address which bothers the most. Simple heartbeats and monitoring which set of locators are reachable and just picking one and sticking to it till you needed to switch, I could agree with.
AFAICT this is the approach being considered here or at least one of them.
One of them yes, the n^2 probing is one of the options being considered - I hope to quash its further development :) - at least in terms of something that is mandated officially as part of shim6 specs (can still be mentioned in some kind of implementors note).

I mean, imho, we would only need to perform path exploration after an outage
You can do it more cleverly than that.


can you expand on that?

I fail to understand what you are missing. Failure detection and Path exploration are key components of the shim, and they are needed to preserve established communications through outages.
Sure, I agree. But why not do it in the simplest way possible?


and that would be...

on that. It's the n^2 probing I want to ensure is /not/ considered for inclusion in shim6 RFCs, other than as something mentioned as a possible implementation detail.
Ok, i think i see now. Your problem is with probing with different source locators, right?
Yup.
Well, this is needed because the source address determines the exit path from the multihomed site. I mean, because we are assuming PA addressing, changing the source address results in using a different ISP in the multihomed site. That is why different source address need to be explored
Explored and then dismissed as a bad idea, I strongly hope.
The source to use for the shim should be determined by the /destination/ address in the shim6 packet you receive (eg) first from the other side.

Ie, given two shim6 stacks, A and B, that want to talk to each other via the global internet (eg to exchange info needed to create shim mapping between them, A initiates), each with 3 addresses say (A1, A2, so on). The communication required is:

A retrieves the required locator information, and gets a list "B1, B2, B3". It then sends a packet to each remote address, n packets at a time, in series, with whatever shim6 control message is required:
A(IN6_ADDRANY) -> B1
A(IN6_ADDRANY) -> B2
A(IN6_ADDRANY) -> B3
some time later a reply is received:
A(x)  <-  B3
The source address A should prefer to use (if it must prefer one over IN6_ADDRANY) is 'x', as determined by which destination address was in the packet of B's that got through.
See, that's simple and works - no probing required.

first of all, this is probing since you sent 3 packets. Second, which source addresses has A used for sending these packets?

If "n^2 probing is simply not an option" is in your mind, then you'll start realising there are other, simpler, better ways of achieving the same end-goal - which don't require messing with local SAS policy either, but rather use it as intended (without modification).

There's many many years of existing deployment of IP using systems and applications that simply don't consider such complex probing worth it.

right, because they are not assuming the usage of multiple PA addresses in a single host
That's a fair point, and likely part of the answer.
Another possibility is simply that path failures in the 'middle' of internet are rare and hence users and apps have not had any compelling need to explore this possibility.

As i pointed out in the example above, different source/destiantion address combinations are required to deal with failures in the edges (i.e. ISP - multihomed site links)

Regards, marcelo

Follow-Ups:
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>

References:
- failure detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>

Prev by Date: Re: failure detection
Next by Date: Re: failure detection
Previous by thread: Re: failure detection
Next by thread: Re: failure detection
Index(es):
- Date
- Thread