[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: failure detection

To: Iljitsch van Beijnum <iljitsch@muada.com>
Subject: Re: failure detection
From: Paul Jakma <paul@clubi.ie>
Date: Fri, 19 Aug 2005 22:53:40 +0100 (IST)
Cc: shim6 <shim6@psg.com>
In-reply-to: <3DCD2668-E974-4A98-9A57-B4CB19436CBF@muada.com>
Mail-copies-to: paul@hibernia.jakma.org
References: <8622E6A4-B0D7-4C9B-B184-8EB2A7C2738E@muada.com> <Pine.LNX.4.63.0508141523170.7023@sheen.jakma.org> <efebcb5728efd81901d5357b3993b6db@it.uc3m.es> <Pine.LNX.4.63.0508171556080.5353@sheen.jakma.org> <efa6464a563345cc24542d6ab48f3538@it.uc3m.es> <Pine.LNX.4.63.0508171932550.5353@sheen.jakma.org> <0f13bcc353755a4b9b965267a6a7ffb1@it.uc3m.es> <Pine.LNX.4.63.0508181034240.5291@sheen.jakma.org> <d1bbabb2d2a04821223d24f940796d23@it.uc3m.es> <Pine.LNX.4.63.0508181513480.5291@sheen.jakma.org> <4eb5dc3a95d2217a22ab1d81e23fd10d@it.uc3m.es> <Pine.LNX.4.63.0508191456120.5291@sheen.jakma.org> <9F62897E-8A0C-4588-9C54-842E6C988A0F@muada.com> <Pine.LNX.4.63.0508191816080.5291@sheen.jakma.org> <3DCD2668-E974-4A98-9A57-B4CB19436CBF@muada.com>
Reply-to: paul@jakma.org, shim6@psg.com

On Fri, 19 Aug 2005, Iljitsch van Beijnum wrote:

RAs have very long lifetimes. I think the Cisco default is a week. You can't bring this down to a minute or less without all kinds of interesting side effects.

You have 2 lifetimes, preferred and valid. Valid can't go under 2 hours. Preferred can be as short as you like, down to 3s, and prefered is the one we're interested in.

I have preferred set to 10s here.

What are the side effects?

I agree that when certain information is available, it makes sense to distribute it locally rather than have every host go out and discover the same facts for itself. We'll have to come back to this at some point.


Progress :).

- fix the problems in internet routing
We await your suggestions...

Ha!

However, in a working group not so far away, there are people looking at these things.

The trouble is that you need aggregation to make routing scale, and with aggregation you lose all this interesting info that would have been useful. Routing can still tell you some interesting things when there are wide-spread catastrophes, but I'm not sure it's worth the trouble to optimize for that. (Or rather: I'm pretty sure it isn't.)

That's great, but we're discussing a host protocol (and maybe even a leaf-site border protocol). Hosts should behave like hosts and not try probe every possible path by default. Imagine:

  Host2(shimmed)
  |  \
  |   \
 ISP3  ISP4
  |  \   |
  |   \_tier-1
tier-2   |  \
    \ \__|_  |
     \   | \ |
      ISP1  ISP2
        \   /
         \ /
        Host1 (shimmed)

Host1 is communicating with Host2 using Host2's tier-2 ISP locator.

Tier-2 has a failure affecting the shim 'flow' Host is using. The tier-1's POP gets 2*X probe packets - for no good reason.

Further, within a minute, maybe less, of tier-2 failing, both ISP1 and ISP2 have switched over their routes to ISP3 to go via tier-1.

Now imagine ISP1 and ISP2 have thousands upon thousands of shimmed customers. Now imagine these probes at an internet wide scale. Is it worth introducing all that n^2 probing noise into the internet when core internet routing likely will fix the problem anyway, maybe within a minute, maybe faster?

Probe for availability of the /remote/ locators sure. But don't combine it with every possible combination of local address please - it's not needed.

wow 2*4^2, ie 32 packets to complete probing (worst case). Imagine 50 such shim6 hosts on your network.
Well you really want to send at least 3 probes to account for random packet loss. :-)


Ouch.

But it's easy. Shim6 is *not* TCP, it doesnt need to maintain any /specific/ consistency of addresses. Eg, in this example, why on earth is B replying with (B1,A1)? The reply from B (in my mind) would be (B3,A1).
The reason why it's not easy is that at this point, the shim hasn't been activated yet, we're just doing regular TCP.

I don't quite understand this. This is intended as an optimisation for the case where the ULID's are 'routeable' directly between the 2 hosts right?

This is necessary to maintain backward compatibility.


I'm not sure I fully understand this point.

I suspect this (having a ULID be a locator) can be done too. So you you may only have to shim if addresses change. But once shimmed, the /shim/ mapping can be of *sets* of addresses to the other set of addresses, not of specific tuples. Ie, the mapping should be:

( {A1,A2,A3,A4} , {B1,B2,B3,B4} ) -> .....

Then you can drop/add things out of the sets as required.

If the mapping maps A1 to A1, fair enough. (This is the "null transform" case in Geoff's architecture case right?)

And even if you activate the shim at this point, the two sides haven't been able to compare notes yet, so you can't start doing strange tricks yet, or at least you run into security complications.

Ah, I'm not familiar with these, would you be able to explain or refer me to something?

Ah, good that you said so because we all thought you were supporting this.


ROFL :). Just thought I'd make it clear ;).

Don't forget that if the site exit router does its own version of the ingress filtering, it can send back ICMP messages so the host knows that this source address doesn't work and move on without much delay.


Yep.

So after a maximum of 3 messages with incorrect source addresses A knows it should use A4, and then it only has to do B1, B2, B3 and B4 to find the working A4-B4 pair.


Yep.

Also, if the host has several sessions towards different destinations, it may observe that if 2001:a900:456::1 isn't working, so if it has to choose between trying 2001:a900:789::1 and 3ffe:ffff:789::1 it will choose the latter because there is a chance the whole 2001:a900::/32 block is affected.


That seems a possible strategy, yes.

So in reality having to test 2*n^2 will be extremely unlikely.

I would agree, but more because imho the internet is reasonably reliable, so usually only a few will fail.

2*n^2 is for both sides btw - sent and received. n^2 (or local*remote locators, tending to n^2 for worst case) is probes sent. And the 2 is insignificant anyway compared to the square ;).

The trouble is that many small ISPs around here connect to the rest of the world through one location in Amsterdam, so when there is a power failure at that location, not only their AMS-IX stuff goes down but also their transit. Last time there was an AMS-IX power failure (a month before the generator they were installing because of the one-but-last power failure went online) about 25% of all AMS-IX members were completely unreachable for me.

Yes, that seems to be a problem in BeNeLux - huge overdependence on AMS-IX. When you're that close to AMS-IX and everyone is there, there's not much 'push' to setup physically seperate transit and peerings. That though is an internet architecture issue, particularly for NL and BeNeLux.

The internet ecosystem will evolve further to cope with these things (outside of shim), eg my point about VoIP being a driver in the routing area (see mail to marcelo). Routing will get better and better.

BTW, my ISP just had a very big DoS attack. The shim would have enabled me to keep working to the extent possible, routing can't really do anything in these cases.

Yep, I really want shim too. ;) I want it to allow for sanity though. ;) (while still allowing for implementations to try full probing if they really must).

Do you want to add to the DoS of your ISP by having lots of shimmed hosts then go and DoS the *other* ISP? :)

Note, that for something like a DoS, where you're getting just poor service rather than no service, it might be easier to just update your routes or your local IPv6 RA prefixes to not use that ISP. That's an argument again for making shim6 interact with normal OS routing+SAS layer like any other application.

For extra-browny points, write a small daemon to monitor RTT and loss rates to each of your ISPs and adjust local routes/prefixes/whatever to suit to your tastes.

regards,
--
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
Pick another fortune cookie.

References:
- failure detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- Re: failure detection
  - From: Paul Jakma <paul@clubi.ie>
- Re: failure detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>

Prev by Date: Re: failure detection
Next by Date: Re: shim-aware transports
Previous by thread: Re: failure detection
Next by thread: Re: failure detection
Index(es):
- Date
- Thread