[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RRG] FIB challenges, RIB, BGP, DRAM limitations



Hi Tony,

In "Re: [RRG] Opportunistic Topological Aggregation in the RIB->FIB
Calculation?", quoting Bill Herrin, you wrote:

> | The RIB cost grows with the width of the system times the number
> | of entries. The system width is staying relatively static (and
> | that's expected to continue), so the growth cost is close to
> | linear with the number of entries. We already know that a $5k
> | ($2k if you're on a budget) Linux server running Quagga can keep
> | up with 10 times the entries and churn. With PC hardware's
> | price/performance doubling every 3 years, we haven't long to
> | wait before it can easily handle 100 times the entries and churn
> | that we have today.
>
> In fact, the PC hardware doesn't actually do that.  What we really
> see is that DRAM memory speed grows at about 1.2X every two years
> and that our growth rate is at least 1.3X every two years.


This slow growth in DRAM speed is not surprising, since it is partly
determined by signal propagation in PCB traces, input output
buffers, voltage translation between the core of chips and their I/O
pins etc.  In addition, for DRAM, there is the time required to
precharge, all at once, the capacitive sense lines of millions of
sense amplifiers after each read or write operation.

My impression is that commodity DRAM chips are optimised for reading
and writing large sequences of memory locations at high speed, to
fill (or empty, for a write) a block inside the large SRAM caches on
CPU chips.

I think the latency of mass production DRAMs - the time between
presenting them with an arbitrary address and getting the data
stored at that address - has not fallen significantly at all.

> Since BGP can't live in a cache (at a sane price point), BGP's
> performance (or any other protocol that would carry the same
> number of prefixes) is bounded to converge at the rate
> that we can actually perform DRAM writes at.  As we're growing
> faster than DRAM gets quicker, BGP necessarily converges a little
> more slowly every year.
>
> See:

http://www.iab.org/about/workshops/routingandaddressing/Router_Scalability.pdf

I agree that we need to keep a lid on the number of BGP routes DFZ
routers need to handle.

Map-encap schemes and Six/One Router will do this.

How much of a change to the BGP protocol or to the BGP
implementation in routers would be required to implement the sort of
aggregation you are thinking of in geo-aggregation?  (I still think
geo-aggregation is at odds with with business needs, and puts the
cart before the horse.  The routing system should adapt to business
needs, not the other way round.)


> | All of this means one simple thing: We'll hit the wall on the
> | FIB's capability long before we hit the wall on the BGP RIB's
> | capability.

The average rate of BGP updates is probably not a problem for a PC
implementation, and so the same should hold for a high-end router
which uses latest PC-style RAM and CPUs. (Though routers are likely
to lag by several years behind PCs, since they are not such
mass-market devices, and involve a great deal of integration of many
subsystems which must all work nicely together.)

The limitation would be how fast a PC-style BGP implementation - or
any other implementation which depends on DRAM - could respond to
its incoming messages.  SRAM in the quantities required for BGP is
unworkable (and still there would be inter-chip delays,
impossibility of matching CPU internal clock frequencies to PCB
track signals etc.) so BGP will always have to run with DRAM and a
CPU SRAM cache.

The speed limitation of BGP implementations affects the total
network convergence times.  If all links were exceedingly fast and
so were the BGP implementations, convergence might take seconds,
rather than minutes.  Links have speed limitations, packet loss
limitations and together with the inevitable CPU and DRAM
limitations, we certainly need to limit the number of BGP routes in
the DFZ in order to keep the system reasonably responsive to outages
and reconfigurations.  (Alternatively, replace BGP with something
else, but I don't favour this.)


> | Absent a RIB+FIB solution that the operators are enthusiastic
> | about, it's worth the effort to consider approaches that only
> | reduce the size of the FIB.
>
> Those matters, since they are implementation details that actually
> have private solutions known to some in the industry are really
> not our balliwick.  Again, our charter is to address the routing
> and addressing architectural issues that cause a lack of
> aggregation in the first place.

As traffic volumes keep increasing, I think the FIB, becomes more
and more of a nightmare.  TCAM is expensive, inflexible and
power-hungry - millions of sense lines being thrashed every cycle.

The current state of the art is to use massively parallel CPUs with
a shared (reduced latency) DRAM resource:

  http://www.firstpr.com.au/ip/sram-ip-forwarding/#FIB_techniques

The CPUs have to chew their way through the address of each incoming
packet some number of bits at a time.

http://www.firstpr.com.au/ip/sram-ip-forwarding/index.html#Vargese_CRS_1

I can't see how this is going to work for gigabits of packets where
the FIB has to implement IPv6 /48s.

So I think the FIB in DFZ routers is another major limiting factor.

By using map-encap or a translation scheme (e.g. Six-One Router), we
can leave the RIB and FIB of DFZ routers pretty much as they are.

However, I think there are some real challenges in the FIB of any
router or server which performs ITR functions.  The map-encap or
translation scheme may involve hundreds of millions or billions of
separate divisions in the whole address space.  These challenges
definitely need to be considered when we are discussing the costs
and scaling performance of any such scheme.

 - Robin


--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg