[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] Opportunistic Topological Aggregation in the RIB->FIB Calculation?



On 23 jul 2008, at 4:15, William Herrin wrote:

C. Even if it isn't practical to build such a tunnel, you can
generally pick an alternate link that offers a high probability of
reachability for for the impacted routes. On link failure, cut over to
the alternate while rebuilding the RIB and then FIB.

Hm, if you know your outage (= detection + convergence times) is short enough, say, less than 30 seconds, and certainly if less than 10 seconds, it doesn't really matter where the packets go, you could even drop them and _most_ applications will continue without too much trouble.

If it's going to take a minute or more to restore reachability, you can't play fast and loose and risk loops, because applications will fail.

None of this is perfect, but put together it enables a system that
isn't on the brink of collapse at 100 times the current number of
entries.

This is all highly optimistic. Let's assume you can get your 8 times parallelization, so try the current 250k table on a system that's 12 x slower than what's on the market now. Let's say a 133 MHz Pentium.

In fact, the PC hardware doesn't actually do that. What we really see is that DRAM memory speed grows at about 1.2X every two years and that our
growth rate is at least 1.3X every two years.

That figure sounds fishy. As I recall, we were using a 100mhz memory
bus in 1998 and moving in to 133mhz memory bus. Today we're using a
1300mhz memory bus and moving to a 1600mhz memory bus.

These busses are optimized for serial reading/writing of large cache lines. If you need a single byte, it's still slow.

AMD Opteron processors embed the memory controller in the CPU. Each
CPU manages its own bank of memory with a dedicated memory bus. They
share with each other via a "hypertransport." If the portion of the
RIB associated with part of the address space is intentionally placed
in memory managed by the CPU which will calculate that portion of the
RIB then the computation can proceed in parallel without contention on
the memory bus.

The problem is that all your BGP updates come in over a single TCP session so they must be fanned out to the right CPU. It would be better if we could make it such that you'd have 8 sessions that each carry the right updates to the right CPU.

100 times the entries * 100 times the churn = 10000 times the processing. I'm afraid that your DRAM isn't going to keep up with that in a traditional
design.

You have the combinatorics wrong. Each entry has some probability of
churning each second. So if you have 100 entries, you're 100 times as
likely to see a single-entry churn event. You are NOT 100 times as
likely to see a 100-entry churn event. In fact, you're no more likely
to see a full-table churn event that you were when there was only 1
entry, and each such full-table churn consumes only 100 times the
processing.

The problem is that the single "100 times" figure is rather meaningless as it can both mean what I said and what you said. We know that both the number of entries is growing and the number of updates per entry, so although it's not going to be as bad as I said, it's also not going to be as good as you said.

Obviously the best way to get rid of volatility in the updating is to not carry the prefixes in the first place, but we did flap dampening 10 years ago, I see no reason why we can't do something similar that works a bit better.

Of course all of this assumes that we keep the basic BGP paradigms intact...


--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg