[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: failure detection

To: Iljitsch van Beijnum <iljitsch@muada.com>
Subject: Re: failure detection
From: Paul Jakma <paul@clubi.ie>
Date: Fri, 19 Aug 2005 19:57:38 +0100 (IST)
Cc: shim6 <shim6@psg.com>
In-reply-to: <9F62897E-8A0C-4588-9C54-842E6C988A0F@muada.com>
Mail-copies-to: paul@hibernia.jakma.org
References: <8622E6A4-B0D7-4C9B-B184-8EB2A7C2738E@muada.com> <Pine.LNX.4.63.0508141523170.7023@sheen.jakma.org> <efebcb5728efd81901d5357b3993b6db@it.uc3m.es> <Pine.LNX.4.63.0508171556080.5353@sheen.jakma.org> <efa6464a563345cc24542d6ab48f3538@it.uc3m.es> <Pine.LNX.4.63.0508171932550.5353@sheen.jakma.org> <0f13bcc353755a4b9b965267a6a7ffb1@it.uc3m.es> <Pine.LNX.4.63.0508181034240.5291@sheen.jakma.org> <d1bbabb2d2a04821223d24f940796d23@it.uc3m.es> <Pine.LNX.4.63.0508181513480.5291@sheen.jakma.org> <4eb5dc3a95d2217a22ab1d81e23fd10d@it.uc3m.es> <Pine.LNX.4.63.0508191456120.5291@sheen.jakma.org> <9F62897E-8A0C-4588-9C54-842E6C988A0F@muada.com>

On Fri, 19 Aug 2005, Iljitsch van Beijnum wrote:

The 2*n^2 probing isn't necessary to detect failures (local or otherwise), but to detect what's still working.


I don't agree.

And even if side A can easily detect a failure at site A (which isn't a given, if my DSL line goes down my router knows it but my hosts don't),

We already have protocols for this, there is at least one possibility, IPv6 RAs, so:

- Use IPv6 RAs to advertise both prefixes, with a preferred lifetime
  set according to how quickly you want to switch, eg set it equal
  to twice the RA interval, or even equal to it.

  When the DSL router detects a link has gone down, it simply stops
  advertising the relevant prefix.

- We may need to invent protocols to carry valid-source prefix
  information across multiple routers in a site. (to update
  prefix-advertisement information). I think that's a general IPv6 RA
  problem though.

- Other stateful configuration protocols may be used, eg DHCPv6
  the server would need to become multi-address aware, an
  implementation detail.

NB: The latter two points would not be required if we allow shim6 to work in a 'split' mode. For then the "shimmed" network would only use ULIDs which would always remain valid (no need to change).

Anyway: Then with *one* message (or lack of, to be more specific), from your DSL router all X hosts on your network deprecate use of the ISP-B addressm and can start using others. Far more efficient than *all* your X hosts doing n^2 probes.

Eg, imagine a site multihomed with 3 ISPs, and 50 hosts, which are communicating with a similar network in site-B using shim6 (50 hosts with 3 locators each). That's possibly 50*3^2 packets to send - 450 packets. Even though all that's needed is for the DSL router to broadcast locally "seems I lost my connection to ISP-A" - one packet (or even, lack of one packet).

How can you consider this probing to be at all sane? ;)

how does side B learn this fact?


It starts receiving packets from Site-A with a new locator address.

A host doesn't necessarily know which exit path a router will choose.


That's true. We can fix that in a better way than n^2 probing surely?

So what happens when through means outside our view a packet gets a destination address routed over ISP X, but a source address from address space from ISP Y, and X filters Y's addresses?


See above.

- Host2 shim6 to detect host1's valid locators have changed
   - Maybe because it receives a packet from Host1 with a new
     source

This doesn't allow for unidirectional reachability.

How so? Host2 need not use the source address in its packets which Host1 is using.

You want to specify that shim6 be able to work around /any/ kind of routing failure, anywhere on any part of the internet affecting any path between Host1 and Host2.

Yes, I do. As a BGP jockey, I'm kind of like the health inspector who never eats out... There is a lot going on that regular users don't really know about.


So the answer is:

- have users run around every restaurant to try divine which one
  serves edible food (n^2 probing)

rather than:

- fix the problems in internet routing

Maybe those are a bit more common, but it's not like failures in the core never happen.

To be honest, I don't know, it's my gut feeling. But I'm not the one arguing on a gut feeling to have the IETF specify a host protocol that mandates n^2 probing.

If we don't know enough about internet failures to say whether it is or is not worth building n^2 probing per-default into shim6, then we shouldn't do it. It can always be 'added in' later, or done by implementations if left out, but if you put it in and it gets deployed - it's hard to take back.

Ingress filtering has the potential to create lots of unidirectional reachability for a given address combination.


I don't buy that at all.

If your ISP-A filters out ISP-B sourced packets, then it wont be routing ISP-B destined packets to you.

Nonsense. Congestion is rare these days, and the levels necessary to break connectivity wholesale are almost unheard of.

No it's not. Go live at the edges. I experience congestion *regularly* on my DSL link. I did say "at the edges".

And "break connectivity wholesale .. unheard of" is precisely part of my point - TCP doesn't break. But what will shim do exactly in the face of lossage?

- n^2 probing in shim6 is simply introducing huge expense in order to solve a very uncommon problem
Yeah I don't get this point you're arguing so energetically.

:)

Exponential scaling scares me - particularly when built-in to a protocol. As does the SAS stuff, but you want that because of the n^2 probing issue i think.

Let's build a test network. (I'll be talking about hosts, but obviously many aspects are side-wide.)

Host A has two interfaces, that both eventually connect to a router that connects to two ISPs. So:
Addr A1: int 1 - ISP K
Addr A2: int 1 - ISP L
Addr A3: int 2 - ISP M
Addr A4: int 2 - ISP N
Sanme thing for its correspondent host B:
Addr B1: int 1 - ISP O
Addr B2: int 1 - ISP P
Addr B3: int 2 - ISP Q
Addr B4: int 2 - ISP R

wow 2*4^2, ie 32 packets to complete probing (worst case). Imagine 50 such shim6 hosts on your network.

Let's assume that each router will do source address based routing for the two ISPs it connects to, but the ISPs all do ingress filtering.

Ok.

A initiates a TCP session with destination address B1. Let's assume that the system chooses interface 1 for output and A1 as a source address, so the packets have address pair A1-B1

Now it's entirely possible that B's default route is over ISP Q. So when B sends a reply to the A1-B1 session setup request, it sends a B1-A1 packet out on interface 2. Now either the site exit router will filter it, or it will end up at ISP Q or ISP R, which will filter it. This is the infamous ingress filtering problem that we have to figure out.

But it's easy. Shim6 is *not* TCP, it doesnt need to maintain any /specific/ consistency of addresses. Eg, in this example, why on earth is B replying with (B1,A1)? The reply from B (in my mind) would be (B3,A1).

No filtering problem at all. When this packet arrives at A, it associates (B3,A1) with the right mapping. Simple to do, given its own locator is in there. Why should A care whether replies with the same IP address?

You're thinking, it seems, as if shim6 is like TCP, where each different tuples of (source,destination) *must* refer to different connections. But there is no need for such a restriction.

But let's assume we somehow fix this problem, and packets flow without trouble between A1 and B1.


Yes we can easily fix it. The flows will look like:

A -> B using (A1,B1)
A <- B using (B3,A1) (note, B need not even use A1, it could use A2)

The TCP session continues for a bit, and at some poin the shim wakes up and decides that this is a long-term session that should be protected from failures. So the shim layer on host A sends out a packet with source A1 and destination B1 (= addresses from the TCP session) which includes security stuff and the list of local alternative locators: A2, A3 and A4. B also happens to implement the shim, so it answers with some security stuff of its own and its list of alternative locators: B2, B3 and B4.

Ok.

So now we're ready for the internet to fail.
Scenario 1: A's link to ISP K fails.
Since this is something A's router can detect, presumably any packets from A1 to B2 will get back an ICMP message, and after a few RTTs TCP becomes really unhappy. The shim may also observe that there are packets going from A1 to B1, but there is nothing coming in from B1 to A1. Maybe the shim decides to fire off a probe from A1 to B1 for good measure. But eventually, it's clear that A1 to B1 doesn't work anymore.

Ok.

Now suppose that the reachability detection subsystem at A decides to see if B2 works. If A sticks to source address A1, then the packet will also incur an ICMP and not make it. So either A sees the ICMP and selects a different source address, or it decides that A1-B2 doesn't seem to work either and goes on to the next address pair. For instance A could try A2-B1. And this one works!

So from now on any outgoing packets with addresses A1-B1 in them are rewritten into A2-B1 and sent on their way.

Any complaints so far?

Well, I wouldn't have /shim/ in A picking A2, but I wouldn't preclude it either. So no complaints.

Scenario 2: big failure, and everything is wiped out except A4-B4. (From where I sit 99% of all traffic flows through Amsterdam, and most of that 99% over the AMS-IX. A nice big power failure there really hurts my connectivity.)

Ha, an AMS-IX outage would hurt me too ;). There are intra-Ireland paths that actually go via european IXes, typically LINX but I've seen AMS-IX paths too. A lot of european traffic goes via IXes.

BTW: Note that I strongly disagree with trying to solve all the internet's routing problems by making every end-host do n^2 probing.

So A tries:
A1-B2
A2-B1
A1-B3
A3-B1
A1-B4
A4-B1
and on and on and on, until it eventually determines that A4-B4 works.
You don't want this to happen. So what's the alternative? Give up after the second try? The fourth? The n^2/2th?


IMHO, you should only try:

(unspecified) -> B1
(unspecified) -> B3
(unspecified) -> B4

In my universe (which happens to correspond vaguely to how the internet works today ;) ), when AMS-IX dies, within 1 to 3 minutes or so, your ISP, in conjunction with other ISPs start propogating WITHDRAWs and UPDATEs and converge so that packets flow again.

I much prefer that ISPs take care of the business of getting packets from A to B than specify an n^2 end-host probing protocol within the IETF.

What next? What if there are multiple failures between A4-B4, such that somewhere between A4 and B4 there is one path which works and one which does not. One which we could work around by doing source-specified hop-by-hop routing?

Let's have shim6 probe all the intermediary paths too. And let's abolish the IETF routing area while we're at it :). (kidding, but you see the stretched point I'm making I hope).

Remember that while all of this is going on, the transport protocol sees a black hole. So at any time, the transport can decide to time out. The shim doesn't do anything that actually _hurts_ regular transport protocols.

If you call the potential for a smallish network sending out near to 1k probe packets "doesn't hurt" for not much gain, sure.

regards,
--
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
Man usually avoids attributing cleverness to somebody else -- unless it
is an enemy.
		-- Albert Einstein

Follow-Ups:
- Re: failure detection
  - From: Erik Nordmark <erik.nordmark@sun.com>
- Re: failure detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>

References:
- failure detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>

Prev by Date: Re: failure detection
Next by Date: Re: shim-aware transports
Previous by thread: Re: failure detection
Next by thread: Re: failure detection
Index(es):
- Date
- Thread