[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ISP failures and site multihoming [Re: Enforcing unreachabilityof site local addresses]



On Thu, 20 Feb 2003, Iljitsch van Beijnum wrote:
> On Thu, 20 Feb 2003, Pekka Savola wrote:
> 
> > > Your comment may be true, but my clients are nonetheless unwilling to risk
> > > the possibility of an extended network outage on a single ISP (while not
> > > frequent, these events are far from unprecedented) rendering their online
> > > customer-support environment unavailable for several hours, much less for
> > > a day.  Shorter outages (on the order of minutes in the single digits) are
> > > tolerated, provided that such outages are infrequent.
> 
> > This is a very problematic approach IMO.
> 
> > Need more resiliency?  Network outages unacceptable?
> 
> > The right place to fix this is the network service provider, period.
> > Nothing else seems like a scalable approach.
> 
> There is no technical reason why a single service provider network can
> do better than a similar network that consists of several smaller
> service provider networks.  [...]

Of course not -- but it's *much* easier.

> > Consider a case when many companies _phone_ services would have been
> > changed to VoIP.  IP would be a critical service.  Do the enterprises
> > protect against failures by getting more ISP's?  Unscalable.  No, the
> > ISP's _must_ get better.  Pick one well when choosing them.
> 
> We are _very_ far from a situation where even the best ISP provides a
> service level that is better then the one you get from multihoming even
> if you consider failover delays.

In some cases, this may be better.  In some others, not.

It's not IMO necessary to get significantly better but "roughly equal".
 
> > When ISP's have SLA's, a lot of customers for which continued service is
> > of utmost importance, the networks *will* work.  There is just no other
> > choice.  If the mobile phone of CTO, CEO or whatever rings after (1)5
> > minutes of network outage, things _will_ happen.
> 
> My experience with SLAs is that they are a marketing tool and job
> security for bureaucrats. They don't make the worst case any better,
> they only make the worst case slightly less frequent.
> 
> (What makes you think this mobile phone will ring anyway? Speaking of
> unreliable networks...)

Some means of contacting will always exist, or the monitoring system of 
ISP's has developed so far the problems in the IP service are extremely 
rare.

> And the single service provider thing doesn't scale anyway: the end
> result would have to be a single global ISP.

It does scale, pretty well actually.  I'm not talking about your average
neighborhood ISP's with 100 customers, though.  Currently in DFZ, there
are about 3500 (ONLY!) AS numbers which transit at least one other AS
number.

Even multiplying that with 10, we would not have a problem.
 
> > It just seems the mentality in some networks is that network outages are
> > ok, networks don't have to be designed with multiple connections, etc.etc.
> 
> > That must change if we want to build a mission-critical IP infrastructure.
> > Instead of making every site try to deal with the problems themselves.
> 
> Has the end-to-end principle failed to teach us anything? Reliability
> begins and ends in the end hosts. If each host is connected over two
> service providers there are four possible paths the hosts can switch
> between on a per-packet basis. Then the only problem becomes detecting
> failure. The end hosts are in an excellent position to do this without
> having to generate keepalive messages; a well designed protocol could
> switch to an alternate path within a few round trip times when a path
> failure occurs.

Compare this to a solution where the site has two connections to the same 
ISP, and you're left with major ISP backbone failure or upstream failure 
(any relevant ISP's have only one upstream)?

Doesn't sound that difficult to me, particularly as these problems affect 
the whole (or majority) of the ISP -- and hence, are fixed quickly.

A solution without multi-connecting, ie. only one L1 connection to one 
ISP, is naturally out of question.
 
> Multi6 has been gravitating towards multi-address multihoming solutions
> for a while now, but unfortunately it seems impossible to move foward.

Multi-address solutions solve certain problems well, but leave some 
unsolved.  Coupling multi-addressing with multi-connecting, you have a 
very comprehensive solution, IMO.

-- 
Pekka Savola                 "You each name yourselves king, yet the
Netcore Oy                    kingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings