[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] thoughts on the design space 3: caching: FIBs, local query servers



Short version:

  The nature of caches, the functionality of ITRs, the location of
  the nearest full database mapping query servers and the methods
  of pushing mapping to those query servers are all absolutely
  crucial to the performance of a map-encap system.

  These should all be regarded as important aspects of the
  "architectural" discussion, and their importance can only be
  fully understood by discussing the proposals in detail.

  The two hybrid push-pull (local query server) systems - APT and
  Ivip - have very different characteristics to (and many benefits
  over) the pure pull (global query system) proposals: LISP-ALT and
  TRRP.


Hi Jari,

In various earlier messages in this thread, you wrote:

> - caching is responsible for a number of problems people have with
>   the solutions

The problem is not caching in the ITR.  It is a caching ITR relying
on a global query system to get mapping information.  This is a
problem for LISP-ALT and TRRP.

APT and Ivip use a local query server, which has the full database
pushed to it.  This is a fast and reliable way for each caching ITR
to get its mapping information.  Initial packets would be delayed to
a very small extent - to a degree which I think is insignificant.

The idea is that the local full database query server (in APT, the
"Default Mapper") is in the same provider or end-user network as the
ITR which queries it.  There would probably be multiple such local
full database query servers.  Ivip also allows caching query servers
between the caching ITRs ITRCs and the full database query servers
QSDs).

Ivip's OITRDs (Open ITRs in the DFZ, like LISP Proxy Tunnel Routers)
would have their own local query server, or would integrate the
query server and so be full database ITRs.


> - there does not seem to be a compelling reason that the
>   caching-or-not design should affect the architecture

I completely disagree.

Caching ITRs are necessary in order that ITRs can be cheap,
plentiful.  In some instances, (not behind NAT) a caching ITR
function could be integrated into a sending host for little or no
cost.  This removes the need for physical ITRs to handle the
outgoing packets of that sending host.

With all ITRs being full database (LISP-NERD) there would be high
costs of storage, computation and getting the mapping data securely
and reliably to every ITR.  This would mean that ITRs would be
expensive to install and run.  This would mean that ITRs would be
less numerous, and so located further from sending hosts.  This
would also mean that each ITR would need to handle more traffic,
making the system more sensitive to peak traffic volumes than one
with a larger number of ITRs.  This is all bad news.  The answer is
to have most all all ITRs use cache.

The hybrid push-pull schemes - APT and Ivip - can also have full
database ITRs, but most work is done by caching ITRs.

Questions of cost and complexity of ITRs, and of the difficulties
getting mapping data to them are absolutely central to the success
or failure of any map-encap proposal.  If "architecture" is so
"elevated" as to be removed from these concerns, then I think it is
far to removed from reality to be useful.


> - FIB caching is something that you could do in today's routers if
>   you wanted to

Assuming a map-encap system is required to handle tens of millions
to billions of micronets (Ivip terminology - "EID" prefixes, for
LISP, APT and TRRP) then the FIB is surely going to be a caching system.

The FIB needs to be able to return one of the following for every
incoming packet, according to its destination address, using the FIB
hardware - whether that be hardwired logic and/or massively parallel
CPUs:

 1 - The packet is addressed to an ordinary BGP-managed address
     (RLOC space in LISP terminology).  So forward it
     conventionally.  This includes packets addressed to BGP-
     managed space for which there is no specific route, so the
     packet is sent to the default route.

 2 - The packet is addressed to a micronet (EID prefix) which the
     FIB currently has mapping information for.  So encapsulate
     it and tunnel it accordingly.

 3 - The packet is addressed to a section of the address space which
     is known to be covered by the map-encap scheme, but the FIB
     has no mapping information for this particular address.
     Therefore, hold the packet and query the routing processor to
     ask for the mapping information.  Later, when this arrives,
     the packet will be encapsulated and tunnelled accordingly.
     Subsequent packets matching the micronet (EID prefix) which
     was specified in the mapping reply will be handled by process
     2 above.

 4 - No match: drop the packet or handle it by some special
     mechanism.


Overall, for every ITR, there will be an arrangement like this:

                   query
   Caching FIB  --------> Route Processor
               <--------
                response

The questions are whether the Route Processor has the full database
or not, and how it gets mapping information - full push for a full
database or by asking for it from a query server if it is a caching
ITR.  With LISP-NERD, every ITR has the full database, but I believe
LISP-NERD can't scale well and would result in too few, overly
expensive, ITRs.

In APT, the Default Mappers are routers with the full database, so
their Route Processors answer the FIB's queries immediately.

Ivip has provision for full database ITRs too, but the more I think
about it, the more likely it seems that these will really be caching
ITRs (ITRCs) directly connected, such as by an Ethernet patchcord,
to a co-located full database query server (QSD).  This way,
multiple ITRCs could share a single QSD.  Also, if this QSD failed,
the ITRCs could use another QSD which is more distant, but still
within the same network.

You alluded to such an arrangement in your initial message in this
thread:

  http://psg.com/lists/rrg/2008/msg01944.html

   > Second, even if you believe that we DO need a cache, there's
   > really no reason why the cache design has to be cast into the
   > rest of the architecture. Even if your forwarding ASIC can't
   > employ enough memory for all the entries, I have a hard time
   > convincing myself that we can't have a general purpose computer
   > sitting next to it that holds the full table. I'm writing this
   > on a laptop that has a 160G disk. That would be enough for
   > several mapping tables containing EVERY IPV4 ADDRESS.

An Ivip IPv4 mapping entry contains 12 bytes:

   4 bytes Micronet starting address
   4 bytes (could be less in many cases) Micronet length
   4 bytes ETR address

so your hard drive could, in principle, hold mapping for 13 billion
micronets.  (IPv6 mapping is much more verbose: 16 + ~16 + 16 = 48
bytes.  Restricting IPv6 mapping granularity to /64, which I favour,
reduces this to 8 + ~8 + 16 = 32.)

   > My take is that if you cannot build an ITR/ETR that holds the
   > entire mapping table in memory, maybe it would be better to
   > attach it to a general purpose computer that can (slowly)
   > handle the cache misses and hand the packets back to the
   > router.

Yes.  If we make the query server not in the same rack, but in the
same provider or end-user network, with the ITR having several
accessible for redundancy and load sharing, then this is the core
principle behind APT's and Ivip's "hybrid push-pull - local query
server" architecture.


Most ITRs in APT and Ivip are caching ITRs.

Only with LISP-ALT and TRRP are the following true:

  All ITRs are caching ITRs.

  All ITRs depend on a global query server network for their
  mapping information.

This is where the problems with caching appear - long delays and
unreliable communications for gaining the mapping information they
need to forward packets which are addressed to EID prefixes and for
which they currently have no cached mapping information.


> Second, I'm not necessarily worried about the DNS lookup + SYN
> case, and I don't believe the impact on that is very significant.
> As was stated earlier, DNS already introduces some delay.

The "initial packet delay" problem is serious, I am sure.

ALT typically involves very long paths for mapping resolution, and
for its approach to sending these initial packets towards the ETR
while the mapping information arrives. (The best approach is to send
the packet on the ALT network as the mapping request.)

See "ALT's strong aggregation often leads to *very* long paths" at:

  http://www.firstpr.com.au/ip/ivip/lisp-links/

TRRP has similar problems, although in RRG discussions with Bill
Herrin, it emerged that the globally distributed DNS-like query
server network could be collapsed to a series of more local query
server sites.

  http://psg.com/lists/rrg/2008/msg00488.html

It looks costly and messy to me, and involves pushing mapping to all
those sites.  The result somewhat resembles APT or Ivip, and the
TRRP query server system could in principle be changed progressively
from the global (slow, unreliable) system to the collapsed, more
local, system - without changing ITRs.


> I'm not necessarily worried about the DNS lookup + SYN case, and I
> don't believe the impact on that is very significant.

The significantly delayed "initial packet" problem can result in the
sending host concluding that the packet is lost, so it will generate
new packets, perhaps to other micronets (EID prefixes) for which
there is also no mapping in the ITR.

The "initial packet" delay problem can slow, stall or derail
exchanges including:

  One or more stages of recursive DNS lookups - many or most DNS
  servers may be in map-encap managed address space, since this is
  where we want the great majority of end-user networks.

     Firstly the outgoing DNS request.

     Secondly, the DNS server sending the response to the querying
     host, if it is on map-encap managed space.

  The establishment of a TCP session.

     Firstly the SYN packet going to the remote host.

     Secondly, the SYN + ACK from that host to the original sending
     host.

These all add up, and I think the problem is likely to be serious,
especially if ALT has these long paths, often crossing the globe:

  ALT's strong aggregation often leads to *very* long paths
  http://psg.com/lists/rrg/2008/msg00229.html

  http://www.antd.nist.gov/~ksriram/strong_aggregation.png

  http://www.firstpr.com.au/ip/ivip/lisp-links/#long_paths

Even if the delay problem was of infrequent and/or marginal impact
to end-users, there would still be a marketing problem for any
company who used the new form of address space.  It could be
portrayed as second-rate, and so most companies would want to avoid it:

  http://psg.com/lists/rrg/2008/msg00272.html

We need end-user networks of all sizes to *want* to adopt the new
system, because only then can we get the great majority of end-user
networks using scalable addressing and routing.


Dino wrote:

>> Well, one thing this list keeps doing is trying to draw the line
>> where architecture should stop leaving it for some other process,
>> say engineering, to make the really hard decisions. If the
>> architecture is too high-level and doesn't "draw a line in the
>> sand", then the practical architecture will happen after this
>> step in the process.
>>
>> So you can't punt very many things or you become irrelevant.

I agree completely.

I have written several times about my concern that the RRG's
avoidance of detail will make it impossible for it to discuss this
field properly and to produce a really substantial final
recommendation in March 2009:

  RRG scope avoids practical concerns
  http://psg.com/lists/rrg/2008/msg01863.html  (2008-07-20)

Discussion of these concerns were immediately ruled out of scope:

  http://psg.com/lists/rrg/2008/msg01864.html

   > How about we stop discussing what we're allowed to discuss?
   >
   > Tony

However, since you and Dino are discussing the level of detail in
RRG discussions:

> This is true, and I'm very eager to make the decisions about
> details that matter.

I am discussing it again.

I am really perplexed that you can make broad, sweeping statements
such as:

> - there does not seem to be a compelling reason that the
>   caching-or-not design should affect the architecture

over a year after APT and Ivip have been described in sufficient
detail for most people to recognise the importance of caching and
the close proximity of query servers to provide mapping information
quickly and reliably.

Likewise:

> For instance, encapsulation, how the mapping function works, etc.
> However, based on what I said in my initial mail, I'm not
> sure caching is one of those details.

Questions of where the caches are - and how far the mapping requests
and responses travel - are absolutely crucial to the performance of
the entire system.

Over a year after APT and Ivip were described in detail in Internet
Drafts, we really shouldn't be stuck in discussions which seem to
ignore these proposals and the benefits of local query servers.

Likewise, it isn't realistic to ignore APT, Ivip or TRRP by
discussing map-encap primarily in terms of LISP.


 - Robin


--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg