[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RRG] Opportunistic Topological Aggregation in the RIB->FIB Calculation?
- To: William Herrin <bill@herrin.us>, Iljitsch van Beijnum <iljitsch@muada.com>
- Subject: Re: [RRG] Opportunistic Topological Aggregation in the RIB->FIB Calculation?
- From: Peter Sherbin <pesherb@yahoo.com>
- Date: Wed, 23 Jul 2008 06:39:19 -0700 (PDT)
- Cc: tony.li@tony.li, Routing Research Group <rrg@psg.com>
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Message-ID; b=XvNEa+R7rw71cnidxg6C5tqt+R9zPIftl2wsfh6FTuH2b09LDpQiMVOrIoVA+3kr+zVI/VmZrYfFvQ/F0ShHgCSfK6tXV6i2ZIGXXHrOnf9LeuL5+EqQVUKN9bjwwNX9TjY3GYHIjXoTvz+eNfu80nWRBu4rSKEnuDb1Usw2RnE=;
- In-reply-to: <31D258E3-976F-444B-AA01-C61FDAEEF07D@muada.com>
- Reply-to: pesherb@yahoo.com
Of course all of this assumes that we keep the basic BGP paradigms
intact...
RFC 4271: "BGP Identifier A 4-octet unsigned integer that indicates the BGP Identifier of the sender... A given BGP speaker sets the value of its BGP Identifier to an IP address assigned to that BGP speaker..."
Is the above going to change in IPv6?
Thanks,
Peter
--- On Wed, 7/23/08, Iljitsch van Beijnum <iljitsch@muada.com> wrote:
> From: Iljitsch van Beijnum <iljitsch@muada.com>
> Subject: Re: [RRG] Opportunistic Topological Aggregation in the RIB->FIB Calculation?
> To: "William Herrin" <bill@herrin.us>
> Cc: tony.li@tony.li, "Routing Research Group" <rrg@psg.com>
> Date: Wednesday, July 23, 2008, 4:23 AM
> On 23 jul 2008, at 4:15, William Herrin wrote:
>
> > C. Even if it isn't practical to build such a
> tunnel, you can
> > generally pick an alternate link that offers a high
> probability of
> > reachability for for the impacted routes. On link
> failure, cut over to
> > the alternate while rebuilding the RIB and then FIB.
>
> Hm, if you know your outage (= detection + convergence
> times) is short
> enough, say, less than 30 seconds, and certainly if less
> than 10
> seconds, it doesn't really matter where the packets go,
> you could even
> drop them and _most_ applications will continue without too
> much
> trouble.
>
> If it's going to take a minute or more to restore
> reachability, you
> can't play fast and loose and risk loops, because
> applications will
> fail.
>
> > None of this is perfect, but put together it enables a
> system that
> > isn't on the brink of collapse at 100 times the
> current number of
> > entries.
>
> This is all highly optimistic. Let's assume you can get
> your 8 times
> parallelization, so try the current 250k table on a system
> that's 12 x
> slower than what's on the market now. Let's say a
> 133 MHz Pentium.
>
> >> In fact, the PC hardware doesn't actually do
> that. What we really
> >> see is
> >> that DRAM memory speed grows at about 1.2X every
> two years and that
> >> our
> >> growth rate is at least 1.3X every two years.
>
> > That figure sounds fishy. As I recall, we were using a
> 100mhz memory
> > bus in 1998 and moving in to 133mhz memory bus. Today
> we're using a
> > 1300mhz memory bus and moving to a 1600mhz memory bus.
>
> These busses are optimized for serial reading/writing of
> large cache
> lines. If you need a single byte, it's still slow.
>
> > AMD Opteron processors embed the memory controller in
> the CPU. Each
> > CPU manages its own bank of memory with a dedicated
> memory bus. They
> > share with each other via a
> "hypertransport." If the portion of the
> > RIB associated with part of the address space is
> intentionally placed
> > in memory managed by the CPU which will calculate that
> portion of the
> > RIB then the computation can proceed in parallel
> without contention on
> > the memory bus.
>
> The problem is that all your BGP updates come in over a
> single TCP
> session so they must be fanned out to the right CPU. It
> would be
> better if we could make it such that you'd have 8
> sessions that each
> carry the right updates to the right CPU.
>
> >> 100 times the entries * 100 times the churn =
> 10000 times the
> >> processing.
> >> I'm afraid that your DRAM isn't going to
> keep up with that in a
> >> traditional
> >> design.
>
> > You have the combinatorics wrong. Each entry has some
> probability of
> > churning each second. So if you have 100 entries,
> you're 100 times as
> > likely to see a single-entry churn event. You are NOT
> 100 times as
> > likely to see a 100-entry churn event. In fact,
> you're no more likely
> > to see a full-table churn event that you were when
> there was only 1
> > entry, and each such full-table churn consumes only
> 100 times the
> > processing.
>
> The problem is that the single "100 times" figure
> is rather
> meaningless as it can both mean what I said and what you
> said. We know
> that both the number of entries is growing and the number
> of updates
> per entry, so although it's not going to be as bad as I
> said, it's
> also not going to be as good as you said.
>
> Obviously the best way to get rid of volatility in the
> updating is to
> not carry the prefixes in the first place, but we did flap
> dampening
> 10 years ago, I see no reason why we can't do something
> similar that
> works a bit better.
>
> Of course all of this assumes that we keep the basic BGP
> paradigms
> intact...
>
>
> --
> to unsubscribe send a message to rrg-request@psg.com with
> the
> word 'unsubscribe' in a single line as the message
> text body.
> archive: <http://psg.com/lists/rrg/> &
> ftp://psg.com/pub/lists/rrg
--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg