[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Shim6 failure recovery after garbage collection

To: Igor Gashinsky <igor@gashinsky.net>
Subject: Re: Shim6 failure recovery after garbage collection
From: marcelo bagnulo braun <marcelo@it.uc3m.es>
Date: Tue, 18 Apr 2006 10:37:55 +0300
Cc: Joe Abley <jabley@isc.org>, shim6@psg.com, Scott Leibrand <sleibrand@internap.com>, Erik Nordmark <erik.nordmark@sun.com>
In-reply-to: <Pine.LNX.4.60.0604172025580.13083@moonbase.nullrouteit.net>
References: <Pine.LNX.4.58.0603271500280.29315@sleibrand-ibm.acs.internap.com> <44441EB5.1020901@sun.com> <B03D6407-10F2-420D-9BCE-AD55B590D4E8@isc.org> <Pine.LNX.4.60.0604172025580.13083@moonbase.nullrouteit.net>

Hi Igor,

El 18/04/2006, a las 4:23, Igor Gashinsky escribió:

:: One of the more comprehensible objections to shim6 that was raisedat NANOG:: 35 was from large content providers who currently serve manythousands of:: simultaneous clients through load balancers or othercontent-aggregation:: devices (the kind of devices which switch connections to originservers
:: without having to store any locally).
::
:: I don't remember the precise number of simultaneous sessions thedevices were
:: intended to be capable of serving, but it was a lot.
::
:: The observation was that with the amount of (server, client) statebeing held:: on those devices, adding what might be an average of (say) 2x128bits + misc
:: overhead per session might present scaling difficulties.
A single WSM-6 Foundry SI450 can handle 15M sessions in the statemachine.Assuming an overhead of say, 320 bits per session * 15M sessions wecome
up with approx 600MB of extra RAM added to those devices (and that's on
the low side). Multiply out that a large content provider would have
*hundreds* of these devices, it's not a small cost (depending on
the vendor, that memory is not general purpose DRAM, and could be*very*
expensive). Now, that is only extra memory to do nothing but hold other
locators.

i am not very familiar with this type of devices, but why do they needto have shim6 state on them?.... i mean, the shim state is requiredonly the end point of the communication, not in any middle box,AFAICT...

On the web server side, it's not uneard of for a single webserver to
handle 10-20k active, concurrent connections, with another 20k or so
being in various *_WAIT states. Adding an extra 40byte per session
overhead per server is really not that bad (800kb of RAM/server),
although I have no idea what that overhead does to the kernel queues...

but the shim6 context is not per connection, but per peer (moreprecisely per ULID pair).... hence the question is how many differentulid pairs are involved in these connections... I mean AFAIK, eachclient establishes in general quite more than one TCP connection to theweb server to download a page...

The thing is, those numbers are only taking into account holding
*locators*, and when you start talking about holding onto other things
(like, say reachability state, performance (RTT) state) the memory
utilizations starts to increase slightly more, although still manageble
for the servers (but is rapidly getting more and more expensive on the
SLB's).
 When you then start talking about now holding some sort of a TE
state (because TE is a requirement), and you need to add the routing
table into the equation, *now* it's gets down right nasty. 10-20Mb per
server for shim6 overhead is minor, but add in 200+Mb of routing state,
and it's a non-starter.

but you don't need to have the full routing table in the server...

I mean the server need to know its preferences about whichlocators-pairs it preffers

i guess you can have a very fine granulated policy but i don't thinkyou will need 200.000 preferences set in the server.

Now perhaps want you need is to use the routing table information todecide which locator do you preffer, is that it?

In this case, i agree that having full bgp table may be useful toselect which path to use (hence which source address to use to reach acertain destiantion) Morevoer, it is likely that you may need multiplebgp feed, one per available ISP (so that you can select the paththrough the ISP that is best)

But for this, it is possible to off load the bgp processing to aseparate box, like the NAROS solution proposed a while back.

Also, all of this conversation is only talking about memory overhead,what
about other overhead? Would the server have to do any sort of failure
detection, and how many cycles would that consume?

that would depend on the type of traffic...

if the traffic is bidirectional (which in the TCP case, it usually is)and the timers are tuned, then probably there is no need to sendkeepalives, so no added traffic

but again it depends on the traffic pattern

 Would the server have
to do any sort of path optimization,

not sure what do you mean by this...

 and how many cycles would that
consume? How do I get TE state to all of my 100k+ host?

We have already exchanged some emails about this, about DHCP and so on.But i guess the point is:

- this is not addressed today

- however, we can try to design a mechanism that address this need thatfullfills the requirements that you have, whether this is like acentral server(s) that download the policy to the hosts or this issomething like what is Erik draft about letting the routers to selectthe exit path and rewrite to source addresses, is up for discussion.

 How many
cycles would all the hosts need to consume if one of my peers
bounces, and now, instead of 10-20 routers processing that, all 100k
of my hosts have to be updated with that information?

but would all the 100k hosts will be actively talking to the samepeer? is that a expected scenario? i would say that in general if apeer bounces, then those hosts that have active communications withthis peer will need to rehome their communication


Regards, marcelo

 etc...

-igor

References:
- Shim6 failure recovery after garbage collection
  - From: Scott Leibrand <sleibrand@internap.com>
- Re: Shim6 failure recovery after garbage collection
  - From: Erik Nordmark <erik.nordmark@sun.com>
- Re: Shim6 failure recovery after garbage collection
  - From: Joe Abley <jabley@isc.org>
- Re: Shim6 failure recovery after garbage collection
  - From: Igor Gashinsky <igor@gashinsky.net>

Prev by Date: Re: Shim6 failure recovery after garbage collection
Next by Date: Re: API for SHIM
Previous by thread: Re: Shim6 failure recovery after garbage collection
Next by thread: transparent addrsel policy adjustment for outbound TE
Index(es):
- Date
- Thread