[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ISP failures and site multihoming [Re: Enforcing unreachability of site local addresses]



There is no technical reason why a single service provider network can
do better than a similar network that consists of several smaller

See Abha and Craigs paper on convergence of BGP. Personally I would go
for a large provider with multiple connections.
Based on this paper? What I see is rarely as bad as what they describe.
However, I had the chance to experiment a little with revoking a
longer prefix and then see how soon the shorter prefix would "catch" the
traffic a while back, and this was certainly interesting: the state goes
back and forth between "working" and "not working" several times over
the course of two minutes. But simple failover is usually pretty fast.
My experience is different, and I believe many others share that. But this will always be different for everyone.

Last fall I was invited to a conference in Sweden to debate multihoming
and the enterprise. Before me was this enterprise IT manager who showed
how much more resilient his network was with two BGP sessions. While he
talked I checked his announcements just to find that one of the
providers bought transit from the other. You can't buy clue.
You can buy a good book that explains it all.  :-)

Did you check to see if the second ISPs also had additional upstreams?
Yes they did. And they bought transit from the first provider.


But IGPs have the same
fundamental problem (although the details may differ). OSPF for
instance takes 40 seconds to detect a dead circuit.

there was a fix proposed in San Diego (although for IS-IS) but that was
voted down. There was pros and cons.
Just type:

 ip ospf hello-interval 1
 ip ospf dead-interval 3

But do it on ALL your boxes in the subnet or you'll live to regret it.
This I thought was more or less standard. I was talking about less than 100ms convergence.

- kurtis -