[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] Routers in DFZ

To: Peter Sherbin <pesherb@yahoo.com>
Subject: Re: [RRG] Routers in DFZ
From: "John G. Scudder" <jgs@bgp.nu>
Date: Fri, 10 Aug 2007 17:46:03 -0400
Cc: Routing Research Group list <rrg@psg.com>, bmanning@vacation.karoshi.com, ppml@arin.net, nanog@nanog.org, Robin Whittle <rw@firstpr.com.au>, "Ricardo V. Oliveira" <rveloso@CS.UCLA.EDU>, John Scudder <jgs@juniper.net>
In-reply-to: <487856.32056.qm@web58701.mail.re1.yahoo.com>
References: <487856.32056.qm@web58701.mail.re1.yahoo.com>

Peter,

I have a few observations on this "stake in the ground".

First and foremost, the notion that linear extrapolation can revealwhen "the CPU would run out of cycles to compute [BGP] convergence"is most likely incorrect. There are several reasons for this, butthe principal one is the fact that BGP operates over a flow-controlled transport, combined with the "state compression" propertyof BGP. The former ensures that BGP will only transmit updates asfast as its peer can consume them, so any proper BGP implementationwon't be overrun even if slow -- it'll just apply backpressure to itspeer by allowing the TCP window to close. (This is just a naturalconsequence of any application operating over TCP.) The latterprovides that when a BGP implementation wishing to send to a peerbecomes flow-blocked, it'll stop generating update messages (modulowhat may already have been written to the TCP socket). When itbecomes unblocked, it begins generating update messages reflectingthe current state of its routing table. To see how this works,consider the following pair of examples:

Two fast routers conversing:

- Fast router F talks to fast router G. The connection to G flowsfreely.- F receives an update for prefix P from some other peer. Itpropagates the update to G immediately.- F receives another update for prefix P from yet another peer. Itpropagates it immediately.- F receives yet another such update from still another peer. Itpropagates it immediately.Net effect, G has received three updates for P, and has converged toP's final state. CPU consumption on G to converge to P's final stateis 3 x (cost to process one update).

A fast router conversing with a slow one:
- Fast router F talks to slow router S.
- At some point, S flow-blocks F.

- While flow-blocked, F receives the same set of three updates for Pas in the example above. It doesn't propagate them to S since theconnection is flow-blocked.

- At some later point, S unblocks F's connection.
- At that point, F propagates only the final state of P.

Net effect, S has received only a single update for P, and has alsoconverged to P's final state. CPU consumption on S to converge toP's final state is 1 x (cost to process one update).

(I discussed this property during my Routing Area presentation at thePrague IETF, and I think Geoff Huston has touched on it in some ofhis recent articles.)

The example is somewhat simplified, and the dynamics that emerge fromthis are non-obvious but I think you'll agree that a simple linearextrapolation doesn't work. I think the most we can say is that the"stake in the ground" provides a lower limit on expectations, not atight bound as the previous author suggests.

A second observation is that the analysis given assumes today's(yesterday's, actually) control plane CPUs but extrapolates out toyears from now. To paraphrase one of the other (quoted, anonymous)authors, although we may not know particulars of future upgrades,it's not reasonable to assume that control plane CPUs will never beupgraded.

None of this should be construed as an opinion that BGP can't beimproved or that it will scale infinitely. But "BGP will stopworking ... can't converge" is overly pessimistic.

In closing I'll just say that whatever the quoted (2.5M) number maybe, it's certainly nothing like a "theoretical limit". It's at besta guess, and I've argued above, probably not a very accurate one.

Regards,

--John

On Aug 10, 2007, at 10:29 AM, Peter Sherbin wrote:

Here is a good comment on the recent RRG discussion about routersin DFZ andrelationship between number of prefixes and the processing power.Details are below

and here is the essence:

	so, one might presume that w/o a change in algorithm, and unlimited
	memory, that the CPU would run out of cycles to compute convergence

at ~ 10x the current size of the routing table (abt 250,000prefixes).

	so putting a stake in the ground, BGP will stop working @ around
	2,500,000 routes - can't converge...  regardless of IPv4 or IPv6.
	unless the CPU's change or the convergence algorithm changes.

In particular it provides a theoretical limit that can be added tothe Problem

Statement draft-narten-radir-problem-statement-00.txt

Thanks,

Peter


--- bmanning@vacation.karoshi.com wrote:

 I asked this question to a couple of folks:

	"at the current churn rate/ration, at what size doe the FIB need to
         be before it will not converge?"

 and got these answers:

--------- jabber log ---------
a fine question, has been asked many times, and afaik noone has
provided any empirically grounded answer.

a few realities hinder our ability to answer this question.

(1) there are technology factors we can't predict, e.g.,
        moore's law effects on hardware development
(2) there are economics and policy and social factors we
        can't predict, e.g., how much convegence-capable
        hardware will providers/vendors be able to afford,
        how those costs will affect consumer prices,
        how that will affect consumer uptake, network
        growth, and industry dynamics, how regulation affects
        all of the above
(3) We Don't Have Any Data from providers on the dynamics of BGP
        and IGP interactions, much less network wide convergence,
        so the research community can't provide any empirically
        grounded input into an answer

{elided}
-------------------------------
&
------ Forwarded Message ------

Date: Tue, 07 Aug 2007
To: bmanning@karoshi.com
Subject: CPU Usage

Router		      Upstream Uptime		BGP cpu per 1 sec uptime
Cat6500/SUP720		1	>1yr		53ms/sec
C7200/NPE-G1		1	158days		15ms/sec
C7304/NSE100		4+2	177days		55ms/sec
C7200/NPE-G1		1+2	26days		 8ms/sec
C7301			1	214days		 7ms/sec
GR2000			0+1	101days		 6ms/sec

Upstream: M+N, M is # of EBGP with full route feed , N is # of IBGP
with full route feed

Provided if the CPU consumption is propotional to the routing table
size, the hard limit would be 10 times to the current size, allowing
other tasks to obtain some CPU cycles.

----- End forwarded message -----

	so, one might presume that w/o a change in algorithm, and unlimited
	memory, that the CPU would run out of cycles to compute convergence

at ~ 10x the current size of the routing table (abt 250,000prefixes).

	so putting a stake in the ground, BGP will stop working @ around
	2,500,000 routes - can't converge...  regardless of IPv4 or IPv6.
	unless the CPU's change or the convergence algorithm changes.

--bill


--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg

Follow-Ups:
- Re: [RRG] Routers in DFZ
  - From: Dino Farinacci <dino@cisco.com>

References:
- Re: [RRG] Routers in DFZ
  - From: Peter Sherbin <pesherb@yahoo.com>

Prev by Date: [RRG] draft-sherbin-eia-00.txt
Next by Date: Re: [RRG] Routers in DFZ: scaling & limits on routes
Previous by thread: Re: [RRG] Routers in DFZ
Next by thread: Re: [RRG] Routers in DFZ
Index(es):
- Date
- Thread