[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ISP failures and site multihoming [Re: Enforcing unreachability of site local addresses]



[ post by non-subscriber.  with the massive amount of spam, it is easy to miss
  and therefore delete posts by non-subscribers.  if you wish to regularly
  post from an address that is not subscribed to this mailing list, send a
  message to <listname>-owner@ops.ietf.org and ask to have the alternate
  address added to the list of addresses from which submissions are
  automatically accepted. ]

To talk about resilience is a snare for the unwary until you specify
what you are providing resilience against.

The large enterprises I work with have a simple policy, set by the
non-technicians who run the organisation; no single point of failure.
That doesn't mean no single point of failure in the ISP - that is up
to the management of the ISP - but no single failure in the parts the
enterprise management is accountable for, such as the links to the
ISP.  So if one ISP piggy backs on another, that is the choice of the
management of the ISP and so not their responsibility.

In the non-IP world, that invariably means taking bandwidth, basic
carrier services, to two different telcos, coming out of different
parts of the site, to different exchanges, perhaps with different
media (copper and fibre, fibre and wireless), not having a power
supply (eg local electricity supplier) in common.  If in the centre of
the world these feeds share a fibre, that is the telco's concern, not
the enterprise's.  The enterprise should ensure that the telcos do not
share a local exchange or fibre to the site but no more than that.

This is a system that has evolved over decades with much pain but it
does work well because the bulk of failures -  some surveys put it as
high as 95% - occur within the 'last mile' and that is what this gives
resilience against.

IP? same difference.  It is the links to the ISPs'  points of presence
(obviously not co-located) that are the primary concern of the
enterprise management.  If ISPs share a resource, which in a sense
every ISP does because there is a single world-wide BGP RIB, then that
is the responsibility of the ISP.

So multi-homing is a must.

Tom Petch, Network Consultant
nwnetworks@dial.pipex.com

-----Original Message-----
From: Kurt Erik Lindqvist <kurtis@kurtis.pp.se>
To: Iljitsch van Beijnum <iljitsch@muada.com>
Cc: Pekka Savola <pekkas@netcore.fi>; Alan E. Beard
<aeb1@aebeard.com>; Tim Chown <tjc@ecs.soton.ac.uk>;
ipng@sunroof.eng.sun.com <ipng@sunroof.eng.sun.com>;
multi6@ops.ietf.org <multi6@ops.ietf.org>
Date: 24 February 2003 08:29
Subject: Re: ISP failures and site multihoming [Re: Enforcing
unreachability of site local addresses]


>>>> There is no technical reason why a single service provider
network
>>>> can
>>>> do better than a similar network that consists of several smaller
>>
>>> See Abha and Craigs paper on convergence of BGP. Personally I
would go
>>> for a large provider with multiple connections.
>>
>> Based on this paper? What I see is rarely as bad as what they
describe.
>> However, I had the chance to experiment a little with revoking a
>> longer prefix and then see how soon the shorter prefix would
"catch"
>> the
>> traffic a while back, and this was certainly interesting: the state
>> goes
>> back and forth between "working" and "not working" several times
over
>> the course of two minutes. But simple failover is usually pretty
fast.
>
>My experience is different, and I believe many others share that. But
>this will always be different for everyone.
>
>>> Last fall I was invited to a conference in Sweden to debate
>>> multihoming
>>> and the enterprise. Before me was this enterprise IT manager who
>>> showed
>>> how much more resilient his network was with two BGP sessions.
While
>>> he
>>> talked I checked his announcements just to find that one of the
>>> providers bought transit from the other. You can't buy clue.
>>
>> You can buy a good book that explains it all.  :-)
>>
>> Did you check to see if the second ISPs also had additional
upstreams?
>
>Yes they did. And they bought transit from the first provider.
>
>>
>>>> But IGPs have the same
>>>> fundamental problem (although the details may differ). OSPF for
>>>> instance takes 40 seconds to detect a dead circuit.
>>
>>> there was a fix proposed in San Diego (although for IS-IS) but
that
>>> was
>>> voted down. There was pros and cons.
>>
>> Just type:
>>
>>  ip ospf hello-interval 1
>>  ip ospf dead-interval 3
>>
>> But do it on ALL your boxes in the subnet or you'll live to regret
it.
>
>This I thought was more or less standard. I was talking about less
than
>100ms convergence.
>
>- kurtis -
>
>--------------------------------------------------------------------
>IETF IPng Working Group Mailing List
>IPng Home Page:                      http://playground.sun.com/ipng
>FTP archive:                      ftp://playground.sun.com/pub/ipng
>Direct all administrative requests to majordomo@sunroof.eng.sun.com
>--------------------------------------------------------------------