[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: failure detection

To: marcelo bagnulo braun <marcelo@it.uc3m.es>
Subject: Re: failure detection
From: Paul Jakma <paul@clubi.ie>
Date: Thu, 18 Aug 2005 11:14:09 +0100 (IST)
Cc: shim6 <shim6@psg.com>
In-reply-to: <0f13bcc353755a4b9b965267a6a7ffb1@it.uc3m.es>
Mail-copies-to: paul@hibernia.jakma.org
Mail-followup-to: paul@hibernia.jakma.org
References: <8622E6A4-B0D7-4C9B-B184-8EB2A7C2738E@muada.com> <Pine.LNX.4.63.0508141523170.7023@sheen.jakma.org> <efebcb5728efd81901d5357b3993b6db@it.uc3m.es> <Pine.LNX.4.63.0508171556080.5353@sheen.jakma.org> <efa6464a563345cc24542d6ab48f3538@it.uc3m.es> <Pine.LNX.4.63.0508171932550.5353@sheen.jakma.org> <0f13bcc353755a4b9b965267a6a7ffb1@it.uc3m.es>

Hi Marcelo,

On Thu, 18 Aug 2005, marcelo bagnulo braun wrote:

SHIM wg is about providing multihoming support for IPv6. In particular a solution for IPv6 multihoming must be able to preserve communications through outages in the communicating path. Such functionality is provided in IPv4 through BGP features but it requires the injection of site routes in the interdomain routing.

In IPv6 PA addressing is used, so we need additional mechanisms (the shim) to provide equivalent functionalities, in particular to preserve established communications through outages.


Ok. We're on the same page here I think.

1. Why this is a compelling argument given that it's been possible to publish multiple addresses in DNS for a long long time, yet there has been 0 demand for either applications to implement n^2 path-probing of each local address to every remote address, or for OSes to implement some kind of 'path-probe' shim to provide such functionality for all applications?

I am afraid you are missing our goal here. this is not a matter of oportunity but the way we can preserve established communication through outages. see above


But you can preserve comms without n^2 probing.

With the shim, path are closely related to addresses used, in particular exit paths of the multihomed site are related to source addresses used.


Agreed.

So in order to provide this type of features, source address selection has to be influenced, for instance using RFC 3484 policy table

So in order to determine the source, you're saying the table which affects source selection has to be influenced by shim6. Surely that's like saying "shim6 will use whatever source pleases it" (ie simply ignoring SAS for the final shim6 output packet).

The SAS aspects of this n^2 probing talk bother me greatly. It will ignore local policy, policy which might be there for very good reasons, which shim6 has no oversight of.

Ie: The best way to honour local policy is to use INADDR_ANY and let the OS decide the source address by consulting local routing policy - alternatively, an administratively specified address. Why exactly is shim6 so different from everything else on the internet and special that this would not work for it?

this is exactly how the shim would support policing see above

No, you said shim6 likely will have to influence SAS. Thats quite different from not worrying about SAS at all and letting the OS decide according to its local policy.

3. The traditional way on the internet to guard against path failures is to get a routing feed (and no, that does *not* imply you advertise anything), why is shim6 so special that it can't defer to existing practice?

scalability. traditional IPv4 routing based multihoming lacks of it

Note again "does *not* imply you advertise anything". Everyone is (sort of) agreed "multihome by advertising your prefix" doesn't scale and should not be considered for IPv6 - hence shim6.

I'm saying you can still get a "read-only" routing feed (BGP or whatever), purely for informational purposes, to help decide which of your ISPs has the best path.

That's entirely scaleable and well within reason for deployment at 'enterprise' shim6 sites.

4. How will you decide which path is best?

local policy can be expressed to some degree with the policy table defined in RFC 3484. If more fine grained expression is needed (e.g. per app) additional parameters need to be included in the policy table

Sorry, my question was more about the metrics you will use to determine whether path (local A, remote B) is better than (local C, remote D). Eg, RTT, packet loss, etc.

we seem to be assuming that multihoming support is something useful and that it will be needed in IPv6.


I'd agree with that assumption. :)

This multihoming support seems to require communications to be preserved through outages.

Agreed, but I still don't understand why this requires shim6 to specify overriding existing SAS policy and do n^2 probing.

6. If path-probing really is desired, explain why this is shim6 specific?
path exploration is a fundamental part of the shim protocol-. maybe is not shim specific and ideas from other similar protocols can be used, but it is imho a key part of the shim protocol and need to be part of it.

If it is not specific to shim6, why should it be solved in shim6? (In the 'for every possible path of the combinations of local and remote addresses')

- software that does local path-probing to determine reachability of
  locally attached gateways (eg IPMP in Solaris for one)

this seems to be local, while shim is defined e2e

Indeed. But it's already implemented. And local probing (with no SAS messing about) likely will do for many cases.

- BFD and some other protocols in development
B means bidirectional, and we are not assuming bidirectional paths here


Ah, yes.

- software that monitors the systems route-cache and does
  path-probing for destinations that currently see flows
not sure what you mean by those but in any case, i am sure we can benefit from these designs as well from the others you emntioned to design the shim path exploration.

Yes, shim6 could benefit from these potential designs by *not* getting involved in complex path probing and SAS. ;)

Eg, the local address probing proposed for shim6 is /counter/ productive to other possible external mechanisms (worst of all, including "get a read-only BGP feed from your ISPs", which is the most practical way to do this.)

If you are familiar with those, i am sure that your knowledge would be very useful to help with the design of the path exploration protocol of the shim

Drop the local source selection from shim6, things become easier, shim6 will interoperate better with other routing software, etc.

Note that it's the probing using every single local address which bothers the most. Simple heartbeats and monitoring which set of locators are reachable and just picking one and sticking to it till you needed to switch, I could agree with.
AFAICT this is the approach being considered here or at least one of them.

One of them yes, the n^2 probing is one of the options being considered - I hope to quash its further development :) - at least in terms of something that is mandated officially as part of shim6 specs (can still be mentioned in some kind of implementors note).

I mean, imho, we would only need to perform path exploration after an outage


You can do it more cleverly than that.

I fail to understand what you are missing. Failure detection and Path exploration are key components of the shim, and they are needed to preserve established communications through outages.


Sure, I agree. But why not do it in the simplest way possible?

on that. It's the n^2 probing I want to ensure is /not/ considered for inclusion in shim6 RFCs, other than as something mentioned as a possible implementation detail.

Ok, i think i see now. Your problem is with probing with different source locators, right?


Yup.

Well, this is needed because the source address determines the exit path from the multihomed site. I mean, because we are assuming PA addressing, changing the source address results in using a different ISP in the multihomed site. That is why different source address need to be explored


Explored and then dismissed as a bad idea, I strongly hope.

The source to use for the shim should be determined by the /destination/ address in the shim6 packet you receive (eg) first from the other side.

Ie, given two shim6 stacks, A and B, that want to talk to each other via the global internet (eg to exchange info needed to create shim mapping between them, A initiates), each with 3 addresses say (A1, A2, so on). The communication required is:

A retrieves the required locator information, and gets a list "B1, B2, B3". It then sends a packet to each remote address, n packets at a time, in series, with whatever shim6 control message is required:

A(IN6_ADDRANY) -> B1
A(IN6_ADDRANY) -> B2
A(IN6_ADDRANY) -> B3

some time later a reply is received:

A(x)  <-  B3

The source address A should prefer to use (if it must prefer one over IN6_ADDRANY) is 'x', as determined by which destination address was in the packet of B's that got through.

See, that's simple and works - no probing required.

If "n^2 probing is simply not an option" is in your mind, then you'll start realising there are other, simpler, better ways of achieving the same end-goal - which don't require messing with local SAS policy either, but rather use it as intended (without modification).

There's many many years of existing deployment of IP using systems and applications that simply don't consider such complex probing worth it.

right, because they are not assuming the usage of multiple PA addresses in a single host


That's a fair point, and likely part of the answer.

Another possibility is simply that path failures in the 'middle' of internet are rare and hence users and apps have not had any compelling need to explore this possibility.

When you include multiple PA addresses in hosts within a multihomed site, then you find out that you need to try with different source addresses.

No, you don't. Because you think it's an option (it isn't :) ), you've stopped trying to find better options (which there are).

Note that part of my definition of "better option" is one which includes "doesn't use n^2 probing and fiddle with SAS", so it may be a self-fulfilling definition.

sort of... you still lacking DoS protection and locator security but kind of what is being considered (with a couple of additional messages)


Yes, no security in that.

well, yes, but we are considering quite a few optimizations for this, like ULP feedback and traffic monitoring also but yes, This is in the lines of the failure detection mechanism being considered,

Ok.

ULP feedback - don't rely on it :) (I checked, it exists in Solaris TCP, for use by NDP, as you pointed out. I can't find anything in Linux though).

path exploration is more complex than that, because you need to change the source address to change the exit ISP. remeber that we are assuming PA addresses, and they are only routed through one of the ISPs of the multihomed site.

I know this fine well, I've had operational experience of PA multihoming using tunneling with IPv4. :)

See above, given above, you *do not* need to do anything clever with source-address selection at all, imho.

i think that most of this stuff is included in the current drafts, just that additional complexity is considered, for instance security stuff, unidirectional path support and cosniderations about the constraints imposed by the usage of multiple PA addresses in the multohomed site and ingress filtering


Sure.

regards,
--
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
The mouse escaped.

Follow-Ups:
- Re: failure detection
  - From: David Meyer <dmm@1-4-5.net>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>

References:
- failure detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>

Prev by Date: Re: failure detection
Next by Date: Re: failure detection
Previous by thread: Re: failure detection
Next by thread: Re: failure detection
Index(es):
- Date
- Thread