[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [RRG] cache issues in LISP and CONS - it's bad . . .

To: 'Robin Whittle' <rw@firstpr.com.au>, 'Routing Research Group list' <rrg@psg.com>
Subject: RE: [RRG] cache issues in LISP and CONS - it's bad . . .
From: Sheng Jiang <shengjiang@huawei.com>
Date: Fri, 19 Oct 2007 09:30:48 +0800
Cc: 'Noel Chiappa' <jnc@mercury.lcs.mit.edu>, 'Dan Jen' <jenster@CS.UCLA.EDU>
In-reply-to: <4716F829.3050208@firstpr.com.au>
Well, there may be other branch that seems been forgotten by all of us: the
future internet architecture, of course, needs a NEW mapping system. An
efficient, robust mapping system can reduce the cache issues a lot, maybe
into an acceptable scope.

DNS-style mapping system is more than 20 years old and was mainly designed
for the fixed network. It cannot meet the new requirements. If the future
internet architecture was established on such out-of-time technology, the
dead end could be expected.

A new mapping system should at least support: fast updating, synchronization
if distributed, real-time handover, choosing between multiple items, etc.
Furthermore, it must have a business model that supports it. One more
consideration is that it might support different ID/locator separate
proposals if we designed it carefully.

Best regards,

Dr. Sheng JIANG

IP Research Department, Networking Research Department, Network Product
Line, Huawei Technologies Co. Ltd.
*-----Original Message-----
*From: owner-rrg@psg.com [mailto:owner-rrg@psg.com] On Behalf Of Robin
Whittle
*Sent: Thursday, October 18, 2007 2:08 PM
*To: Routing Research Group list
*Cc: Noel Chiappa; Dan Jen
*Subject: Re: [RRG] cache issues in LISP and CONS - it's bad . . .
*
*Hi Dan and Noel,
*
*Dan wrote:
*
*> The issue regarding packet drops with LISP and LISP-CONS has been
*> brought up a few times on the list.
*>
*> Basically, packets are dropped for every ITR cache miss.  Since
*> CONS mapping requests may take a long time to be satisfied, this
*> may result in unacceptable service.
*
*LISP-CONS, LISP-NERD and TRRP use ITRs which cache mapping data and
*rely on a query system which spans the Internet to get fresh mapping
*data.
*
*I think that dropping packets - or delaying them while queries and
*responses cross the planet - is not acceptable for the future
*architecture of the Internet.
*
*Each such exchange typically involves multiple packets being sent
*and received, with delays and potential losses.  Each mapping
*request could take a few seconds to resolve under difficult
*circumstances, and half to one second under common circumstances.
*Folks in US and Europe need to remember that other folks have longer
*packet paths - such as 350ms round trip delays from Melbourne
*Australia to the Netherlands under close to ideal circumstances.
*
*DNS-style mapping lookups (LISP-NERD or TRRP) will often involve
*multiple queries to find the right nameserver.  This is especially
*so for IPv6, with the likelihood of having to row through 48 bits or
*more, 4 bits at a time with TRRP or I think LISP-NERD.  There's no
*way an ITR could cache all those intermediate nameservers, even for
*IPv4.
*
*An ITR might be required to drop a packet and query the global
*system in these circumstances:
*
*  1 - A uses DNS to find IP address of B, and B's domain name
*      requires the query go to a nameserver which uses a
*      LISP-etc. mapped address.
*
*  2 - A is on a mapped address too, so for each request to a
*      nameserver in the above query, the nameserver's ITR needs
*      to do a LISP/TRRP CAR-CDR/DNS-like lookup to find the ETR
*      by which A can be reached.
*
*  3 - A's ITR needs to do another mapping lookup before A can
*      send a packet to B.
*
*  4 - B's response requires another lookup, since A is on a
*      mapped address too.
*
*Point 1 and 2 can occur in pairs for as many depths of nameserver
*recursion as is required to get an answer.  For instance five or so
*times to find the IP address of "aa.bb.cc.ee.ff.gg".
*
*At every such query, the LISP or TRRP ITR drops the packet and the
*higher layers have to time out and retry.  Here is an example of
*establishing a TCP link to a host which is identified by a four
*level domain www.xxx.com.au.
*
*The client host A is on a LISP-mapped address, as are the
*nameservers for .com.au and .xxx.com.au.
*
*We assume the client host A has cached the address of the .au
*nameserver, (this would not be the case for the less used TLDs) but
*not of the nameserver for .com.au.  Likewise, we assume A's LISP-ITR
*initially has no mapping for the address of the .com.au nameserver.
*
*This is perhaps unrealistic, since we wouldn't expect the .com.au
*nameservers to be on LISP-mapped addresses.  However the following
*example is valid for a webserver of a department of a company or
*university, such as:
*
*   www.astro.someuni.ch
*   www.sales.wizmo.es
*
*where these organisations use mapped addresses for their networks.
*This is especially the case in the future when LISP/Ivip-etc.-mapped
*address space is likely to be very widely used.
*
*For instance maybe someuni.ch runs one of its nameservers on its own
* LISP-etc.-mapped network and a second one on another university's
*LISP-etc.-mapped network - but the astronomy department has a
*separate campus with one of its nameservers on another portion of
*LISP-etc.-mapped address space etc.
*
*
* A  -X-> ns1.com.au  LISP ITR drops the 1st packet.
*
*                     A times out.  How long does the first time-out
*                     take?
*
* A  ---> ns1.com.au  A sends a 2nd packet, which the ITR now has
*                     mapping for (ideally - but maybe the ITR is
*                     still awaiting a response from the global
*                     CAR-CDR or DNS-like query system) and so
*                     tunnels to the ETR which serves ns1.com.au.
*
* A <-X-  ns1.com.au  The LISP ITR near ns1.com.au drops the packet
*                     - it has no mapping for A's address.
*
*                     A times out again.  How long does the second
*                     time-out take?
*
* A  ---> ns1.com.au  A sends a third packet.
*
*                     Or has A given up on ns1 and tries to send
*                     a query to ns2.com.au?  This is on a totally
*                     different network, and so we REPEAT (not
*                     shown here) all the above steps.
*
*                     Now (ideally) the LISP ITR near A has mapping
*                     information for nsx.com.au - so finally, A
*                     gets its query to nsx.com.au.
*
* A <---  nsx.com.au  Ideally, the nameserver's ITR has mapping for
*                     A by now, so the packet is tunneled to A's
*                     ETR and reaches A.
*
*          Repeat the same stuff (as detailed below) so that A can
*          find out the address of the nameserver for xxx.com.au.
*
* A  -X-> ns1.xxx.com.au  A's 1st packet is dropped by the LISP ITR.
*
*                         A times out.
*
* A  ---> ns1.xxx.com.au  A sends a 2nd packet.
*
* A <-X-  ns1.xxx.com.au  LISP ITR near ns1.xxx.com.au drops packet.
*
*                         A times out again.
*
* A  ---> ns1.xxx.com.au  A sends a third packet.
*
* A <---  nsx.xxx.com.au  A receives the IP address of
*                         www.xxx.com.au.
*
*          Now a similar pattern of dropped packets and time-outs
*          so that A establishes a TCP session with the web server.
*
*          The web server isn't necessarily on the same LISP-mapped
*          address space as the nameservers. It could be at a hosting
*          company, which uses LISP-mapped space so it can move its
*          upstream connections to other ISPs without all its
*          customers having to change their DNS entries for their
*          web servers.
*
* A  -X-> www.xxx.com.au  A's 1st packet dropped by LISP ITR.
*
*                         A times out.
*
* A  ---> www.xxx.com.au  A sends a 2nd packet.  By now, ideally
*                         A's ITR has the mapping data.
*
* A <-X-  www.xxx.com.au  LISP ITR near www.xxx.com.au drops
*                         packet.
*
*                         A times out again.  www.xxx.com.au has
*                         a half-open TCP session dangling . . .
*
* A  ---> www.xxx.com.au  A sends a third packet.  The webserver
*                         half-opens a second TCP session.
*
* A <---  www.xxx.com.au  A gets its TCP acknowledgement and
*                         sends its response . . .
*
* A  ---> www.xxx.com.au  . . . which (ideally) will be tunneled
*                         immediately.
*
*This is arguably worse than some common circumstances, but it is
*easy to think of more difficult cases, with a greater recursion of
*nameservers.  Any packets dropped en-route make things still worse.
*
*This is:
*
*  16 packets sent (not counting flurry of ITR query traffic)
*
*   6 dropped packets
*
*   2 1st time-outs by A's name lookup code.
*
*   2 2nd time-outs by A's name lookup code.
*
*   1 1st time-out by A's TCP session establishment code.
*
*   1 2nd time-out by A's TCP session establishment code.
*
*
*With eFIT-APT or Ivip, there are no dropped packets and we have:
*
* A  ---> ns1.com.au
* A <---  nsx.com.au
*
* A  ---> ns1.xxx.com.au
* A <---  nsx.xxx.com.au
*
* A  ---> www.xxx.com.au
* A <---  www.xxx.com.au
* A  ---> www.xxx.com.au
*
*   7 packets sent
*
*   0 dropped packets
*
*   0 time-outs
*
*In this example, LISP or TRRP typically makes the user and higher
*level protocols wait for 6 time-out periods.  I think this is really
*unacceptable.
*
*The delay would be less with only one level of DNS recursion, and if
*only one end of each exchange using LISP-mapped address space.
*
*If something like LISP or TRRP was introduced, I think people would
*soon find that the address space it handles sucks - due to the
*common experience of excessive delays in establishing sessions of
*any kind.  It would be very much harder to convince people to adopt
*this space if it means a permanent degradation of their own
*experience and of every person who tries to communicate with them.
*
*
*> Suggestions have been made to route packets on the old topology in
*> the event of ITR cache misses. However, this leads to a major
*> incremental deployment issue -- since LISP adopters will still
*> need to maintain their routes in the old topology, there would be
*> no reduction in the size of the global routing table.
*
*I agree there would be no absolute reduction, but there could and
*should still be less BGP prefixes than if LISP etc. was not
*introduced.  I discuss this below in my response to Noel.
*
*> I have not seen any other suggestions on how to handle this issue.
*> Could this be a fundamental problem with the design, or are there
*> other solutions?
*
*It is a fundamental problem with having caching ITRs which can't let
*a packet go to another ITR which has the full database, and where
*the caching ITRs rely on a global-sized - and therefore slow and
*unreliable - query system.
*
*Eliot Lear proposed (RRG messages leading to 264 & 288) that
*LISP-NERD could have one or more ITRs well outside the sending
*host's ISP's network to catch those packets which are not caught by
*an ITR in that network.  I think this amounts to the same thing as
*Ivip's "multicast ITRs in the core".  I think this was meant as a
*means of making LISP-NERD incrementally deployable.  Perhaps, if the
*LISP-NERD ITRs were modified to pass the packets they couldn't
*tunnel, this would mean that the packets would be tunnelled
*immediately, rather than dropped.
*
*In that case, LISP-NERD would resemble eFIT-APT and Ivip in having
*some caching ITRs which could let packets go through to a full
*database ITR (Default Mapper for eFIT-APT, or ITRD for Ivip) when
*the caching ITR didn't have the mapping information.  Perhaps with
*Eliot's suggestion, those "full database" ITRs are only
*"full-database" for one BGP-prefix's range of destination addresses,
*and perhaps there is only one such ITR for each such prefix, rather
*than necessarily multiple of them using anycast.
*
*As it stands, here is the situation as I understand it.  (Please
*check later in this thread for corrections):
*
*                    LISP-CONS  LISP-NERD  eFIT-APT Ivip    TRRP
*
*Full DB ITRs?       No         No         Default  ITRD    No
*                                          Mappers
*
*Caching ITRs?       All        All        ITR      ITRC    All
*                                                   ITFH
*
*Local query         No         No         Default  QSD     No
*servers for                               Mappers  QSC
*caching ITRs?
*
*Caching ITR must    Yes        Yes        No       No      Yes
*drop packets it
*has no mapping for?
*
*Distribution of     Pull       Pull       Push     Push    Pull
*database to         CAR-CDR    DNS-like   slow     fast    DNS-like
*networks with       global     global     via      Repli-  global
*ITRs etc.?          network    network    BGP      cator   network
*                                                   system
*
*Incremental                    RRG                 Yes
*deployment via                 messages
*"anycast ITRs in               264 & 288
*the core"?
*
*
*TRRP also has a method by which some ITRs (depending on how directly
*they queried the authoritative DNS-like mapping information servers)
*get push notification of changed mapping from the authoritative
*mapping servers.  However I can't see how this would scale well.
*
*eFIT-APT and Ivip are the only two proposals in which all packets
*are tunneled without delay.
*
*
*Noel Chiappa wrote, in part:
*
*DJ> Suggestions have been made to route packets on the old topology
*in the
*DJ> event of ITR cache misses. However, this leads to a major
*incremental
*DJ> deployment issue -- since LISP adopters will still need to maintain
*DJ> their routes in the old topology, there would be no reduction in the
*DJ> size of the global routing table.
*>
*> Well, it's not just an incremental issue; either i) we always maintain
the
*> backup routing (in which case we don't get the size reductions, as you
point
*> out),
*
*I disagree with this prevalent notion that maintaining a prefix in
*BGP while that prefix is handled by an ITR-ETR mapping scheme will
*not result in reductions in the size of the global BGP routing table.
*
*This would only be true if every such prefix was used to serve a
*single end-user network.  The major benefit of all these ITR-ETR
*schemes is that they can slice and dice the address space much finer
*than BPG, which is limited (for all practical purposes in the
*foreseeable future) to 256 IP address chunks in IPv4.
*
*If end-user networks all needed more than 128 IP addresses, then it
*might be true that the ITR-ETR mapping systems can't help use IPv4
*address space more efficiently.
*
*However I believe there are a large and growing number of end-user
*networks which require multihoming and/or address space which
*doesn't change when a new ISP is used.  ("Serial multihoming" as
*Iljitsch wrote - though call this "portable address space", with
*apologies to those whose teeth itch at the mention of this phrase.)
*
*On this list a few weeks ago, several other people supported my view
*that there is a significant number of end-user networks with smaller
*(than 128, I guess) IPv4 address requirements who would be well
*served by LISP/eFIT-APT/Ivip/TRRP-mapped space, and who would
*contribute to the growth of the global BGP routing table if such an
*ITR-ETR mapping system was not introduced.
*
*
*> or ii) we don't maintain them into the indefinite future, in which case
*> the speedup of using the backup routing would not be available in the
future.
*
*As long as Ivip or whatever achieves benefits by enabling lots more
*end-user networks to operate as they require without each one
*getting their own BGP-advertised prefix, then I think there's no
*need to worry about maintaining the larger (shorter) prefixes in BGP
*when they are mapped by LISP etc.
*
*
*> But since I think we can make the resolution adequately fast, I don't
think
*> there's a problem here.
*
*You could probably create a complex combination of caching and
*genuine push of update messages to make the CAR-CDR network faster.
*
*The question is whether it would be easier to create a complex and
*souped up CAR-CDR system than to establish a clean - although
*ambitious - system to distribute mapping data globally, within a few
*seconds, as I am planning with Ivip.
*
*My current description of the Replicator system is broad, so it is
*hard to make comparisons of difficulty and cost now.  Over the next
*few months I hope to write up some more elegant, secure and concrete
*details.
*
* - Robin
*
*
*--
*to unsubscribe send a message to rrg-request@psg.com with the
*word 'unsubscribe' in a single line as the message text body.
*archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg



--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg
References:
- Re: [RRG] cache issues in LISP and CONS - it's bad . . .
  - From: Robin Whittle <rw@firstpr.com.au>
Prev by Date: Re: [RRG] cache issues in LISP and CONS
Next by Date: Re: [RRG] cache issues in LISP and CONS - it's bad . . .
Previous by thread: Re: [RRG] cache issues in LISP and CONS - it's bad . . .
Next by thread: re: [RRG] cache issues in LISP and CONS
Index(es):
- Date
- Thread