[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] cache issues in LISP and CONS - it's bad . . .



On 10/22/07, Robin Whittle <rw@firstpr.com.au> wrote:
> Thanks for your response.  Even if most of the TRRP map lookups take
> 200ms or so, and if the ITRs hold the data packets and send them
> when the mapping data arrives, the system will be slower than
> LISP-NERD, eFIT-APT and Ivip.

Hi Robin,

TRRP will be slower on the first packet of the day (and the first
packet after an ITR failover) than any system based on carrying a full
table of all possible maps. The trade off is scalability: TRRP ITRs
need only hold maps for hosts actively talking to each other. This
allows it to accommodate a vastly larger total number of maps.

Imagine trying to build a DNS resolver so that every hostname in the
world was pre-cached on every resolver. Even if allowed to cache
rarely used names on disk instead of in ram, resolvers would have to
be massive and incredibly expensive. And the data flow necessary to
keep all the resolvers up to date would be astonishing.

I propose that what makes no sense for name-mapping likely makes
little sense for address mapping either.

Back at the beginning of time, we had a hosts file and we had RIP. As
we outgrew these choices, we replaced them with DNS (a pull based
mapping system) and BGP (a push based mapping system). We're talking
about replacing/supplementing BGP in what is not the first table-size
crisis while DNS chugs along happy as a clam. This says something
about the relative merits of push versus pull mapping systems. That
entry-level DNS servers cost $500 while entry-level BGP routers cost
$30,000 says even more.


> What about the recursion of looking up such a depth of TRRP servers
> with IPv6?  Each four bits requires a new level of nameserver.

DNS is smarter than that. Each non-recursive query generates a
response with as many hierarchy components as possible. Let me
illustrate with a tcpdump of a query for the MX record for
web.listbounce.democrats.org:

07:55:52.424803 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF],
length: 85) 71.246.241.146.32825 > 199.19.56.1.53: [udp sum ok]  4637%
[1au] MX? web.listbounce.democrats.org. ar: . OPT UDPsize=2048 (57)

07:55:52.451070 IP (tos 0x0, ttl 247, id 49173, offset 0, flags [DF],
length: 153) 199.19.56.1.53 > 71.246.241.146.32825: [udp sum ok]
4637- q: MX? web.listbounce.democrats.org. 0/2/3 ns: democrats.org. NS
ns2.democrats.org., democrats.org. NS ns1.democrats.org. ar:
ns2.democrats.org. A 208.69.4.67, ns1.democrats.org. A 208.69.4.66, .
OPT UDPsize=4096 (125)

07:55:52.453286 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF],
length: 85) 71.246.241.146.32825 > 208.69.4.67.53: [udp sum ok]
40647% [1au] MX? web.listbounce.democrats.org. ar: . OPT UDPsize=2048
(57)

07:55:52.461228 IP (tos 0x0, ttl  57, id 0, offset 0, flags [DF],
length: 191) 208.69.4.67.53 > 71.246.241.146.32825: [udp sum ok]
40647*- q: MX? web.listbounce.democrats.org. 1/2/4
web.listbounce.democrats.org. MX mail1.democrats.org. 10 ns:
democrats.org. NS ns2.democrats.org., democrats.org. NS
ns1.democrats.org. ar: mail1.democrats.org. A 208.69.7.20,
ns1.democrats.org. A 208.69.4.66, ns2.democrats.org. A 208.69.4.67, .
OPT UDPsize=4096 (163)

Notice four things:

1. Each nonrecursive request included the full query, not just a
component of it.

2. Only two queries were needed to get the result for a DNS name with
four levels of hierarchy.

3. The first query went to a0.org.afilias-nst.info, one of the
authoritative name servers for .org. This is because the DNS server
already had the root zone cached and didn't need to query the root
server to find .org.

4. When the authoritative server for "democrats.org" was found, it
only had to be queried once to find web.listbounce.democrats.org
because it had all the remaining levels of hierarchy.



> Firstly.  Most (all?) of the requests the farm handles would come
> from the Australia-NZ region, which involves a relatively small
> subset of the total address space.  Therefore, for a given memory
> capacity and/or query traffic limit, the ITR of the farm can retain
> the mapping details for longer for these addresses than could an ITR
> of a single global (actually a few global) TLD nameserver.
>
> Secondly, by implementing nameservers for most or all TLDs in these
> anycast farms which share the one TRRP ITR, it is much more likely
> that the ITR will already have cached an Australian sending host's
> mapping information when the host is sent a response for the .ru
> nameserver there, because the same host would have been recently
> getting responses from the .au server in the same farm.

If I understand what you're getting at, that's more or less correct.
TRRP may be significantly more efficient if:

1. Cache misses from the ITR's resolver are referred to a regional
resolver farm for help performing the recursive query rather than
locally implementing the non-recursive query.

2. The nameserver addresses are found with glue inside the trrp
hierarchy rather than stepping outside the hierarchy like in-addr.arpa
does.

3. The resolver (or resolver farm) emits queries from a globally
routed address instead of from a TRRP address.


There are several such situations in TRRP where operationally it will
make more sense to use a globally routed address than a TRRP address.
That's OK. TRRP is not intended to replace BGP; its intended to
drastically reduce the number of prefixes that have to be carried via
BGP.


> I had never heard of SWAG.  I figure you meant Scientific Wild Assed
> Guess,

Yep. :)  I could have said, "educated guess," but SWAG is more amusing.

Regards,
Bill Herrin



-- 
William D. Herrin                  herrin@dirtside.com  bill@herrin.us
3005 Crane Dr.                        Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg