[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] Routers in DFZ



Peter,

I have a few observations on this "stake in the ground".

First and foremost, the notion that linear extrapolation can reveal when "the CPU would run out of cycles to compute [BGP] convergence" is most likely incorrect. There are several reasons for this, but the principal one is the fact that BGP operates over a flow- controlled transport, combined with the "state compression" property of BGP. The former ensures that BGP will only transmit updates as fast as its peer can consume them, so any proper BGP implementation won't be overrun even if slow -- it'll just apply backpressure to its peer by allowing the TCP window to close. (This is just a natural consequence of any application operating over TCP.) The latter provides that when a BGP implementation wishing to send to a peer becomes flow-blocked, it'll stop generating update messages (modulo what may already have been written to the TCP socket). When it becomes unblocked, it begins generating update messages reflecting the current state of its routing table. To see how this works, consider the following pair of examples:

Two fast routers conversing:
- Fast router F talks to fast router G. The connection to G flows freely. - F receives an update for prefix P from some other peer. It propagates the update to G immediately. - F receives another update for prefix P from yet another peer. It propagates it immediately. - F receives yet another such update from still another peer. It propagates it immediately. Net effect, G has received three updates for P, and has converged to P's final state. CPU consumption on G to converge to P's final state is 3 x (cost to process one update).

A fast router conversing with a slow one:
- Fast router F talks to slow router S.
- At some point, S flow-blocks F.
- While flow-blocked, F receives the same set of three updates for P as in the example above. It doesn't propagate them to S since the connection is flow-blocked.
- At some later point, S unblocks F's connection.
- At that point, F propagates only the final state of P.
Net effect, S has received only a single update for P, and has also converged to P's final state. CPU consumption on S to converge to P's final state is 1 x (cost to process one update).

(I discussed this property during my Routing Area presentation at the Prague IETF, and I think Geoff Huston has touched on it in some of his recent articles.)

The example is somewhat simplified, and the dynamics that emerge from this are non-obvious but I think you'll agree that a simple linear extrapolation doesn't work. I think the most we can say is that the "stake in the ground" provides a lower limit on expectations, not a tight bound as the previous author suggests.

A second observation is that the analysis given assumes today's (yesterday's, actually) control plane CPUs but extrapolates out to years from now. To paraphrase one of the other (quoted, anonymous) authors, although we may not know particulars of future upgrades, it's not reasonable to assume that control plane CPUs will never be upgraded.

None of this should be construed as an opinion that BGP can't be improved or that it will scale infinitely. But "BGP will stop working ... can't converge" is overly pessimistic.

In closing I'll just say that whatever the quoted (2.5M) number may be, it's certainly nothing like a "theoretical limit". It's at best a guess, and I've argued above, probably not a very accurate one.

Regards,

--John

On Aug 10, 2007, at 10:29 AM, Peter Sherbin wrote:

Here is a good comment on the recent RRG discussion about routers in DFZ and relationship between number of prefixes and the processing power. Details are below
and here is the essence:

	so, one might presume that w/o a change in algorithm, and unlimited
	memory, that the CPU would run out of cycles to compute convergence
at ~ 10x the current size of the routing table (abt 250,000 prefixes).

	so putting a stake in the ground, BGP will stop working @ around
	2,500,000 routes - can't converge...  regardless of IPv4 or IPv6.
	unless the CPU's change or the convergence algorithm changes.

In particular it provides a theoretical limit that can be added to the Problem
Statement draft-narten-radir-problem-statement-00.txt

Thanks,

Peter


--- bmanning@vacation.karoshi.com wrote:

 I asked this question to a couple of folks:

	"at the current churn rate/ration, at what size doe the FIB need to
         be before it will not converge?"

 and got these answers:

--------- jabber log ---------
a fine question, has been asked many times, and afaik noone has
provided any empirically grounded answer.

a few realities hinder our ability to answer this question.

(1) there are technology factors we can't predict, e.g.,
        moore's law effects on hardware development
(2) there are economics and policy and social factors we
        can't predict, e.g., how much convegence-capable
        hardware will providers/vendors be able to afford,
        how those costs will affect consumer prices,
        how that will affect consumer uptake, network
        growth, and industry dynamics, how regulation affects
        all of the above
(3) We Don't Have Any Data from providers on the dynamics of BGP
        and IGP interactions, much less network wide convergence,
        so the research community can't provide any empirically
        grounded input into an answer

{elided}
-------------------------------
&
------ Forwarded Message ------

Date: Tue, 07 Aug 2007
To: bmanning@karoshi.com
Subject: CPU Usage

Router		      Upstream Uptime		BGP cpu per 1 sec uptime
Cat6500/SUP720		1	>1yr		53ms/sec
C7200/NPE-G1		1	158days		15ms/sec
C7304/NSE100		4+2	177days		55ms/sec
C7200/NPE-G1		1+2	26days		 8ms/sec
C7301			1	214days		 7ms/sec
GR2000			0+1	101days		 6ms/sec

Upstream: M+N, M is # of EBGP with full route feed , N is # of IBGP
with full route feed

Provided if the CPU consumption is propotional to the routing table
size, the hard limit would be 10 times to the current size, allowing
other tasks to obtain some CPU cycles.

----- End forwarded message -----

	so, one might presume that w/o a change in algorithm, and unlimited
	memory, that the CPU would run out of cycles to compute convergence
at ~ 10x the current size of the routing table (abt 250,000 prefixes).

	so putting a stake in the ground, BGP will stop working @ around
	2,500,000 routes - can't converge...  regardless of IPv4 or IPv6.
	unless the CPU's change or the convergence algorithm changes.

--bill	


--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg