Requirements for IP Multihoming Architectures

             Requirements for IP Multihoming Architectures

1. Introduction

   Multihoming is an essential component of service for autonomous
   systems connected to the Internet.  The existing multihoming
   architecture is based on CIDR [1], which is predicated upon a
   hierarchy of service providers.

   However, it appears that this hierarchy is being supplanted by a
   dense mesh of interconnections [5].  Additionally, there has been an
   enormous growth in the number of multihomed organizations. For
   purposes of redundancy and load sharing, the multihomed customer
   blocks, which are almost always a longer prefix from the provider
   aggregate, are announced, along with the larger aggregate by the
   provider. This results in rapidly increasing size of the global
   routing table. This explosion places significant stress on both the
   routing protocols and the routing hardware.

   Migration to IPv6 with its greatly enlarged address space is likely
   to exacerbate the weaknesses in the existing system.

2. Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in RFC 2119 [2].

3. Multihoming Requirements

   Multihoming is the connection of one autonomous system to multiple,
   other autonomous systems.  This is done for reasons of service
   redundancy and reliability, and also for load-sharing.

3.1 Redundancy

   By obtaining transit through more than one provider, a network can
   insulate itself from certain failure modes of one or more providers,
   as well as failures within layer 1 and layer 2 infrastructure.

   Specific failure modes which must be protected against by any
   solution for site multihoming in IP networks include: 

   o  Physical link failure, such as a fiber cut or router failure,

   o  Logical link failure, such as a misbehaving router interface,

   o  Routing protocol failure, such as a BGP peer reset,

   o  Service provider failure, such as a backbone-wide IGP failure, and

   o  Exchange failure, such as a BGP reset on an inter-provider

   Additionally, during failure events described above, multihoming
   solutions must provide re-routing transparency for applications;
   i.e. exchange of data between devices on the multi-homed network and
   devices elsewhere on the Internet may proceed with no greater
   interruption than transient packet loss during the re-routing event.

3.2 Load Sharing

   Load sharing is distributing traffic across multiple links. Reasons
   for load sharing include but are not limited to: 

   o  Performance,

   o  Cost, and

   o  Availability of Infrastructure.

3.3 Performance

   One of the reasons for multihoming with two providers is poor
   connectivity between them. For example, customer C is buying transit
   from ISP A, and there is long term congestion between ISP A and ISP
   B. To improve connectivity to ISP B, customer C buys transit from

   ISB B, thus bypassing the congestion and improving the performance
   between C and ISP A. 

   Some operators have requirements to provide different grades of
   connectivity to customers based on network reach, and achieve this
   objective by grouping customers of similar service grades together,
   and advertising their networks over separate transit circuits. The
   result is multihoming for the purpose of transit capacity

3.4 Cost

   A provider may choose to multihome for financial reasons.  For
   example, customer C homed to ISP A may wish to shift traffic of a
   certain class or application, NNTP, for example, to ISP B because
   ISP B charges less for traffic.  Any future multihoming proposals
   must provide support for multihoming for financial reasons.

3.5 Availability of Infrastructure

   Sometimes it is not possible to increase transit capacity to a
   single provider, because that provider does not have sufficient
   spare capacity to sell. In this case a solution is to acquire
   additional transit capacity through a different provider. This
   scenario is common in bandwidth-starved stubs of the network where,
   for example, transit demand outpaces under-sea cable deployment.

3.6 Simplicity and Scalability

   As any proposed multihoming solution must be deployed in real
   networks with real customers, simplicity is paramount. The current
   multihoming solution, despite its drawbacks, is quite
   straightforward to deploy and maintain.

   Unlike the current solution, however, new solutions must provide for
   scaling of the number of multihomed sites many orders of magnitude
   larger than are currently deployed. The limitations for scalability
   with the existing solution and current protocols are outlined in
   Section 4.

4. Overview of the Current Architecture

4.1 Motivations for CIDR-Based Site Multihoming

   The CIDR-based solution currently deployed meets most of the
   requirements defined in Section 3, but also provides the following

   o  Conceptually simple

   o  Fine-grained policy control for multihomed sites

4.2 Drawbacks of CIDR-Based Site Multihoming

   When a site multi-homes, one or more additional prefixes are
   introduced into the global BGP table.  If the site uses
   provider-aggregatable addresses, then upstreams may need to
   advertise both the aggregate and the more specific route, resulting
   in super-linear growth of the default-free zone. 

   Concern over prefix-table growth in the default-free zone is leading
   at least one large provider to filter advertisements received from
   peers on the basis of allocation boundaries, such that long-prefix
   provider-aggregatable prefixes are denied transit across that
   provider's network. If this approach becomes more widespread, the
   ability to multi-home effectively will become restricted to those
   networks who have sufficient addressing requirements to justify a
   provider-independent allocation.

   Furthermore, increasing numbers of multihomed sites accelerates the
   growth in the number of distinct paths for a given prefix in the
   default-free zone.  This causes a significant increase in the
   convergence time for BGP after network changes.  This issue is
   discussed in great detail in [5].

   Although conceptually simple, this approach does not lend itself
   well to troubleshooting of inter-AS network pathologies.  The
   opacity of policy interaction between ASes in the network can hide
   numerous, unpredictable path selection behaviors.

   [1]  Fuller, V., Li, T., Yu, J. and K. Varadhan, "Classless
        Inter-Domain Routing (CIDR): an Address Assignment and
        Aggregation Strategy", RFC 1519, September 1993.

   [2]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
        Levels", RFC 2119, March 1997.

   [3]  Hinden, R. and S. Deering, "IP Version 6 Addressing
        Architecture", RFC 2373, July 1998.

   [4]  Hinden, R., O'Dell, M. and S. Deering, "An IPv6 Aggregatable
        Global Unicast Address Format", RFC 2374, July 1998.

   [5]  Huston, G., "Analyzing the Internet's BGP Routing Table",
        January 2001.

