[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] Geoff Huston's article on BGP stability, update statistics and damping



Robin,

Geoff Huston has a new article "Damping BGP":

  http://www.potaroo.net/ispcol/2007-06/dampbgp.html

He proposes methods of identifying updates which are usually
associated with path hunting.  Typically this is one or multiple
announcements of a longer path for the prefix, followed by the
withdrawal of the prefix.  I understand from this that he
supports "path length damping" as described by Tony Li:

   http://tools.ietf.org/html/draft-li-bgp-stability

These two documents show that there is still some tuning to be done with
BGP to reduce the churn. The proposed approach is mainly to change
slightly the existing implementations to reduce thenumber of BGP
messages advertised after a change. This is a valid way of incrementally
improving BGP. In the past, the measurements studies of Cragi Labotivz
and his colleagues have shown for example that a router should only send
withdraw messages for prefixes that it had previously announced. This
was not the case for most implementations at that time and
implementations have evolved.

I'm sure that some tuning can be done to the BGP damping mechanism, but
I think that the discussion on these issues belongs to the IETF (i.e.
the IDR mailing list) and not the IRTF. I hope that the IRTF can work on
longer term issues.

For example, when considering link failures a much more ambitious goal
could be to develop new interdomain routing techniques that allow to
avoid most packet losses due to link failures. Inside ASes, several
techniques, such as those relying on MPLS have been deployed by large
network operators and other, relying on pure IP, are being developped
within the IETF. Similar techniques could be used to protect BGP peering
links from failure. A possible approach was described in

O. Bonaventure, C. Filsfils, P. Francois, Achieving Sub-50 Milliseconds
Recovery Upon BGP Peering Link Failures, CoNEXT 2005,
http://inl.info.ucl.ac.be/publications/achieving-sub-50-milliseconds-recovery-upon-bgp-peering-link-failures


This approach covers sudden failures, but these are not the only problem
that must be taken into account. Several operators report that link
failures are often caused by management activities. This implies that
there are two types of link failures :
- the sudden ones that occur due to lower layer problems
- the planned ones that occur due to maintenance operations

The first type of failure must typically be adressed quickly, by either
using a fast reroute technique or a fast convergence techniques (fast
convergence can have the diasdvantage of causing more churn than fast
reroute). The second type of failure does not need to be handled
quickly. Instead, it should be possible to recover from the planned
failure without causing any disruption.

When a link fails, packet can be lost for three reasons :
- they are sent on the failed link without noticing the failure
- they are received by a router that does not know anymore a route to
reach the destination and is forced to drop them
- they loop between two or more routers whose routing tables are
transiently inconsistents and there TTL expires after a few loops

For the first reason, fast-reroute techniques have been proposed with
MPLS, pure IP (loop-free alternate, notvia addresses, ...) and also eBGP
peering likns (see the CoNext paper mentionned above).

For the second reason, it should be possible, in the case of protected
link failures or planned maintenances, to indicate that a route should
be less preferred instead of removing it with a withdrawn. BGP graceful
restart could also have some usage here.

For the third reason, I would like to point out that there are several
solutions being discussed within the RTGWG to handle this problem with
OSPF and IS-IS, see e.g
http://tools.ietf.org/html/draft-francois-ordered-fib-02
http://www.tools.ietf.org/html/draft-bonaventure-isis-ordered-00

I think that a solution to improve the stability of BGP should aim at
solving these three problems as well because we need to address both the
stablity of BGP and avoid loosing packets when an alternate path exist.

From this viewpoint, it might be interesting in the paper to refer
to the SIGCOMM 2006 paper by Wang et al. that shows, based on
measurements with a BGP beacon, that packet are lost during a failover
of a dual-homed stub
http://www.sigcomm.org/sigcomm2006/discussion/showpaper.php?paper_id=40

Having an interdomain routing protocol whose main objective is to avoid
losing packets (i.e. keep packets happy as Randy Bush could say) would
be an interesting long term objective for the IRTF RRG working group.

Best regards,



Olivier

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg