[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] cache issues in LISP and CONS



The issue regarding packet drops with LISP and LISP-CONS has been
brought up a few times on the list.

Ah know, everyone tends to focus on corner cases. But please remember this is only for the first source in the source-site sending to the first destination in the destination-site.

We have lived with this problem with ARP and ND for years. I know it's a bit different and localized to a LAN but server-based switch networks are really large with 10s of thousands of end-hosts attached, so this problem occurs more often than people would think and we have lived with it (because it isn't much of an issue IMO).

If we become too pedantic about solving this, it could lead to a very complicated design which will prohibit deployment as well.

So we have to carefully choose our poison.

Basically, packets are dropped for every ITR cache miss.  Since CONS
mapping requests may take a long time to be satisfied, this may result
in unacceptable service.

If people think this is a show-stopper, we could recommend implementations queue small amounts of packets. But that causes unnecessary resource utilization in the implementation. Most IPv4/ARP (if not all) drop packets but the IPv6-ND test suites required to queue exactly one packet (and different Tahi tests indicated to queue either the first one or the last one). So how does queuing exactly one packet actually solve the problem.

Therefore, we have to watch what we ask for from a design.

Suggestions have been made to route packets on the old topology in the
event of ITR cache misses. However, this leads to a major incremental
deployment issue -- since LISP adopters will still need to maintain
their routes in the old topology, there would be no reduction in the
size of the global routing table.

The routes in the old topology are aggregatable RLOCs that map to topology. The routes (EID-prefixes) used in the new topology are highly aggregatable because it is based on allocation hierarchy. So the PI prefixes that were in the old topology can be a smaller set that are in the new topology.

We are very close to testing this alternate topology idea. We are close to testing it because it requires really no new code development to make it happen.

I have not seen any other suggestions on how to handle this issue.
Could this be a fundamental problem with the design, or are there other
solutions?

The problem is not in the design. The problem is just a hard problem to solve.

Dino

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg