[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ISP failures and site multihoming [Re: Enforcing unreachability of site local addresses]



There is no technical reason why a single service provider network can
do better than a similar network that consists of several smaller

See Abha and Craigs paper on convergence of BGP. Personally I would go for a large provider with multiple connections.

Last fall I was invited to a conference in Sweden to debate multihoming and the enterprise. Before me was this enterprise IT manager who showed how much more resilient his network was with two BGP sessions. While he talked I checked his announcements just to find that one of the providers bought transit from the other. You can't buy clue.

service provider networks. Sure, BGP as-is doesn't provide the seamless
failorver some people would like. It annoys me to no end that Cisco uses
a 180 second default hold time for BGP, twice the already too
conservative value that is is suggested in the RFC. This means that when
a circuit goes down BGP takes two or three minutes to notice this. I
always recommend configuring a hold time of 15 seconds, but it seems
some vendors have designed their stuff in such a way that sessions can
fail with this value when the box is busy. But IGPs have the same
fundamental problem (although the details may differ). OSPF for instance
takes 40 seconds to detect a dead circuit.
there was a fix proposed in San Diego (although for IS-IS) but that was voted down. There was pros and cons.

We are _very_ far from a situation where even the best ISP provides a
service level that is better then the one you get from multihoming even
if you consider failover delays.
I would like to see numbers for this. My experience says otherwise. Perhaps CAIDA have something on this? Or the RIPE RIS boxes could give some insight.

Also, these approaches aren't mutually exclusive. ISPs should get
better. Multihoming should get better.
Yes, but not for the same reasons.

At the same time, we should recognize that it is simply impossible to
have the same failover delays at layer 3 as at layer 1.
From my experience you can get very close, but I am very close to a NDA I have signed.


My experience with SLAs is that they are a marketing tool and job
security for bureaucrats. They don't make the worst case any better,
they only make the worst case slightly less frequent.
To some extent I agree. There was inflation in SLAs some year ago. This was bad for the entire industry.

- kurtis -