[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RRG] RAM lookup FIB & prefixes > /24
Hi Chris, in the RAM mailing list, you wrote:
Re: [RAM] Number of DFZ routers
regarding what I wrote about my proposal:
http://www.firstpr.com.au/ip/sram-ip-forwarding/
you wrote:
> keep in mind that /24 is 'globally visible' but there are
> cases inside an ISP where /32's may need to be routed in
> the same way as /16's (same speed at least).
I understand from this that a /32 is used for large volumes
of traffic. I had assumed, perhaps wrongly, that prefixes
longer than /24 in a transit router were not carrying any
"Internet payload traffic" (meaning the end-to-end, high
volume traffic of end-users) but were only so the routers
each have their own IP addresses, with mainly BGP protocol
messages flowing over those longer prefixes.
In a border router or a router within an ISP's network, I
can imagine longer prefixes than /24 being used for high
volume traffic.
Maybe operators of transit networks do the same thing, or
carry other high volumes of data, such as private network
data, in a tunnel via a single IP address or at least
addressed to a prefix longer than /24.
> So, if you mean by /24 "16 million routes in the FIB
> flat routed" then 'ok'... if you mean "flat routing stops
> at /24 anything more specific needs another stage of
> lookups" then, probably that needs to be re-thought :(
This is what I had in mind. The RAM lookup would have 14.5
million entries to handle the top 24 bits of IPv4 destination
address in the range to 223.255.255.255, and would return a
word of FEC (Forwarding Equivalent Class) data.
In my original, probably oversimplified, model I figured a
router with 14 interfaces would need 4 bits of FEC:
0 = Drop the packet.
1 = Handle with some other FIB mechanism which deals with
prefixes longer than /24.
2 = Forward to interface 0.
3 = Forward to interface 1.
...
15 = Forward to interface 13.
I now understand that routers generally require many more
bits of FEC than this, perhaps quite a few bytes, to
specify internal paths, which queue at the output interface
to use etc.
This would handle all the main Internet payload traffic in
a single RAM lookup cycle, which is the fastest, simplest
possible method, assuming you have enough high speed RAM.
The above can be done for 4 bits with a single USD$70
QDR-II SRAM, which does 200 million reads or writes a
second.
I think some recent high-end routers probably already have
enough RAM:
http://www.firstpr.com.au/ip/sram-ip-forwarding/#FIB_techniques
Assuming a /24 limit, then if the number of prefixes in
the FIB grows beyond a certain number (wild guess: 500,000)
in the FIB, the RAM lookup approach (first suggested by Nick
McKeown in 1998) would use less RAM than the Tree-Bitmap
algorithm, which for instance is used in the CRS-1. Direct
24 bit RAM lookup would be much faster than too.
TCAM can be nearly as fast as RAM lookup - at least with
a single TCAM chip - with more TCAMs, I think it gets
slower. TCAM is much more expensive, complex and
power-hungry than SRAM. TCAM updates are complex and
potentially quite slow. SRAM lookup table updates are
simple and fast, except for very short prefixes, which
require a million or more writes, but which are rarely
updated.
I wasn't suggesting that a new generation of routers only
be able to handle up to /24 in hardware, with the CPU
handling packets for longer prefixes. I intended that RAM
lookup be used to solve the FIB cost and speed problems
which would otherwise occur with TCAM or tree-based
algorithms as the number of prefixes grows past a million
and towards 2 or 5 million.
If BGP could be radically souped up, I thought many
problems could be solved by ensuring the next generation
of routers could instantly forward packets to the /24
limit, together with a long-term agreement to maintain
the /24 BGP advertisement limit - at least for a decade
or so when RAM becomes cheaper and routers can handle
/25, /26 etc. in RAM too.
The high end routers M120, MX960 and CRS-1 all use tree-
based recursive lookup algorithms in SRAM or fast DRAM,
rather than TCAM.
I guess their flexible processors should be able to
handle all packets addressed to prefixes /24 or shorter
in a single memory cycle, with an additional cycle or
two for anything to /32.
A simple approach would be to have as many 8 bit address,
256 FEC words long, tables as there were /24 prefixes
which contained at least one longer prefix. Then a "1"
result in the initial 24 bit lookup causes a second,
equally quick, lookup of the FEC in that second table.
With the 4 bit FEC model above, this requires a
non-trivial algorithm to process the 24 bits into a
memory address for that 8 bit table. If the system
stored 32 bits for the FEC data, then a single bit of
that, when set, could indicate the equivalent of a
"1", with the other 31 bits containing the starting
address of the 8 bit table.
Tree-Bitmap requires multiple cycles for a /24 anyway:
http://portal.acm.org/citation.cfm?id=997160
or for an older version:
http://www.cs.ucsd.edu/~varghese/PAPERS/willpaper.pdf
The most significant 8 bits are handled with on-chip RAM
and then three or four bits at a time are worked on, each
involving a very wide, but single address, read cycle of
external RAM.
I guess the median length prefix for Internet traffic,
based on volume, is probably /12 to /16 or so. So that
involves 2, 3 or 4 external RAM lookups anyway.
Although there are lots of /24s advertised, they only
make up ~0.23% of advertised space. I guess only a
few percent of packets are addressed to prefixes with
lengths /20 to /24.
However the router normally does its FIB functions,
I assume the router could be designed to resort to its
normal FIB process, either TCAM or tree-based recursive
lookup, for the "1" FEC value in my table above.
This would still result in faster processing than with
Tree-Bitmap alone. Unless there were ~50,000 or more
prefixes longer than /24, I guess TCAM could handle them.
- Robin
--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg