[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Failover for a multihomed site with unreachable ISP



Hi Cristian

Wouldn't this be somehow similar to RFC 2260?

Regards, marcelo

On Wed, 2003-03-26 at 13:39, Christian Schild wrote:
> Hi ho,
> 
> inspired by the 'dual homing experiment' document of Christian Huitema, 
> we (JOINers) thought about a solution to offer robustness for a site that 
> is multihomed and multiaddressed and that we like to discuss here. 
> Exteral connectivity of such a site is affected by failures of
> 
> - the sites border router
> - the sites uplink
> - the ISPs infrastructure (spec. the router with the link to the customer)
> - the ISPs global border router
> - the ISPs global uplink
> 
> To recover from such a failure in the fist three cases, the site could
> communicate with the ISP a set of possible prefixes and connectivity could
> get reestablished via a tunnel technology. We already thought about this,
> but the solution is quite complex to explain it in short. It is not
> discussed here.
> 
> The approach I try to explain here is a solution for the last two cases,
> where there the ISP is no longer reachable and a tunneled solution is not
> possible.
> 
> 
> First, we consider a failure a seldom and abnormal event. Only if a direct
> connect fails, the network (or the ISP) has to take some failover action.
> This means that - if you think of the size of the global routing table - 
> in default behaviour the table is small (only /32 prefixes) and only in 
> case of a failure a more specific prefix (/48 or shorter) is neccessary.
> 
>                      +----------------------------+
>                      |  'Global Internet'/'DFZ'   |
>                      +--+-----------------------+-+
>                         |                       |
>                         |                       |
>             +-----------+--+                 +--+-----------+
>             |    ISP A     |                 |     ISP B    |
>             | Prefix PA/32 |                 | Prefix PB/32 |
>             +-----------+--+                 +--+-----------+
>                         |                       |
>                         |                       |
>                        ++-----------------------++
>                        |       Customer C        |
>                        |      Prefix PAC/48      |
>                        |      Prefix PBC/48      |
>                        +-------------------------+
> 
> In this scenario customer C gets a /48 from every ISP (PAC/48 from ISP A
> and PBC/48 from ISP B) and communicates the existance of these prefixes to
> every provider (as mentioned earlier, how this is done is not explained
> here). Thus, e.g. ISP A knows that it has a multihomed route to
> customer C with prefix PBC/48.
> 
> Usually, all traffic from the outside to PBC/48 will go through ISP B.
> If ISP B detaches from the DFZ now, ISP A has to announce to the DFZ
> somehow, that it has a valid route to PBC/48. This can be done in two
> different ways within (e)BGP:
> 
> 
> Approach 1:
> When ISP B detaches, it's BGP announcement of PB/32 will vanish from the
> global routing table. ISP A's border router could use this as a trigger to
> announce PBC/48 to the DFZ. It will remove this announcement when PA/32
> reappears.
> 
> The advantage here is that the /48 will only be present in the DFZ, when
> the usual routes fail. The disadvantages are bad convergence and possible
> multiple routes. If ISP A and ISP B are very distant, it may take some
> time until the /48 is known everywhere. If ISP B detaches from the DFZ
> only for a split second - or even flaps - both prefixes will be visible
> in the DFZ for some time.
> 
> 
> Approach 2:
> The second approach is more severe and requires a change in BGPs default
> routing behaviour.
> 
> Usually BGP choses the route for a packet based on the longest prefix
> match calculation. The suggestion here is, that (e)BGP - despite this
> rule - chooses the _shortest_ prefix in the DFZ. This behaviour has to
> be restricted to prefixed between /32 and /48. This restriction is
> neccessary, because we only want this to happen in the routing area.
> Within a site, longest prefix match should still be possible. And,
> shortest match needs to be prevented for prefixes shorter than /32, else
> anyone could create a _real_ black hole by announcing an /<32 to the
> DFZ.
> 
> Assuming this behaviour of (e)BGP, in the above scenario ISP A could simply
> announce PBC/48 to the DFZ. It will get announced to everyone, but it
> should not get used in the forwarding table, cause the (now) better prefix
> PB/32 exists. If now ISP B detaches and the prefix PB/32 is retrieved from
> the routing table, every node in the DFZ will add the PBC/48 to the
> forwarding table and the new route to the customer establishes.
> 
> Advantages here are the faster convergence. A disadvantage is the major
> change in BGPs behaviour. This 'shortest path' criteria may be critical,
> because calculation of the best route might get complex and expensive.
> Also, again the routing table may grow large, but in return the
> forwarding table will stay small. It is not know what other impact a
> 'shortest prefix' selection will have.
> 
> So long,
>      Christian (Schild :-)
-- 
marcelo bagnulo <marcelo@it.uc3m.es>
uc3m