[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RRG] Re: Delays inherent in TRRP's DNS-like lookup?



On Thu, Feb 21, 2008 at 10:19 AM, Robin Whittle <rw@firstpr.com.au> wrote:
>  In "Re: [RRG] Map-encap space for 'server' vs. 'client' end-users?",
>  you wrote about my critique of your TRRP proposal:
>
>   http://bill.herrin.us/network/trrp.html
>
>  that it could take a long time for the ITR to perform all the
>  DNS-like lookups to find the authoritative server for some IP
>  address.

Hi Robin,

This is correct. It is this weakness in TRRP that I seek to address
with optional waypoints in the (unfinished) document at
http://bill.herrin.us/network/trrp-aapip.html


>  This is particularly a problem with IPv6, where each level
>  of DNS-like hierarchy is defined by a smaller number of bits (4, I
>  recall) than with IPv4 (8 bits > 3 decimal characters) and where the
>  length of the EID to be looked up is generally much longer.

This is not correct. The lookup depth is not bound to the length or
number of elements in the lookup.

>  I recognise that the EID isn't necessarily 64 bits long, but some of
>  them would be, I assume.  Otherwise you are stuck with something
>  like /48 granlarity.  In this example the ITR wants the mapping for
>  the 128 bit IP address 2000 E00B 4F30 64A7 8901 56AC 4404 0109.
>  With a /64 bit limit on EID length, it takes the most significant 64
>  bits and creates a DNS query for:
>
>   7.A.4.6.0.3.F.4.B.0.0.E.0.0.0.2.v6.trrp.arpa

It looks like you're a step ahead of me. I was about to suggest that
you explain your understanding of how TRRP's DNS lookup works so I
could find the error. Good.

First off, the EID lookup isn't bound by any subnet masks, so the
lookup for 2000 E00B 4F30 64A7 8901 56AC 4404 0109 is the whole
enchilada:

9.0.1.0.4.0.4.4.C.A.6.5.1.0.9.8.7.A.4.6.0.3.F.4.B.0.0.E.0.0.0.2.v6.trrp.arpa


>  Let's say you had a different authoritative nameserver for each
>  hierarchical level, which seems a reasonable arrangement.

This is an error. A different authoritative nameserver for each
hierarchical level is neither a reasonable nor likely arrangement.
I'll show why below.

>  I will assume that the ETR has already cached the address of the
>  nameserver which is authoritative for the first 12 bits of address,
>  2000E0:
>
>                       0.E.0.0.0.2.v6.trrp.arpa

I concur: the first 12 bits (corresponding to the IANA allocations to
RIRs) will usually be cached. That's 0.0.2.v6.trrp.arpa BTW.
0.E.0.0.0.2.v6.trrp.arpa is 24 bits.


>  So the first lookup is to there:
>
>  Lookup 1:
>                       0.E.0.0.0.2.v6.trrp.arpa

No. The first lookup is for
9.0.1.0.4.0.4.4.C.A.6.5.1.0.9.8.7.A.4.6.0.3.F.4.B.0.0.E.0.0.0.2.v6.trrp.arpa,
sent to one of the servers which is authoritative for
0.0.2.v6.trrp.arpa. That server will be an RIR's server. Since the
address space will be PI assigned by that RIR, this lookup will
resolve to the /48 level, matching the address assignment they made.

>  and the result comes back:

0 "answers" but multiple "authority" records for the DNS servers which
know about the heirarchy under 0.3.F.4.B.0.0.E.0.0.0.2.v6.trrp.arpa
and "additional" records which provide the IP addresses of those DNS
servers. **

Each DNS response has four sections:

1. Questions. This is the lookup that was performed.
2. Answers. The final result of the lookup.
3. Authority. The servers which know the next step in the chain and
what that step is (final answer or something before it)
3. Additional: Information that you'd otherwise likely ask for, if it
happens to be known to the server. An "Additional" section for most
lookups will try to contain the A and AAAA records that match the NS
records in the authority section.


>  Lookup 7:
>         6.0.3.F.4.B.0.0.E.0.0.0.2.v6.trrp.arpa

No. Lookup #2 is also for
9.0.1.0.4.0.3.3.1.0.9.8.7.A.4.6.0.3.F.4.B.0.0.E.0.0.0.2.v6.trrp.arpa,
but it is sent to one of the servers which are authoritative for
0.3.F.4.B.0.0.E.0.0.0.2.v6.trrp.arpa.

The server for 0.3.F.4.B.0.0.E.0.0.0.2.v6.trrp.arpa is inside the
administrative domain for the end-user who controls that /48. If he's
a particularly large end user, he might delegate pieces of it. Usually
though, this second query will produce an authoritative answer.

9.0.1.0.4.0.3.3.1.0.9.8.7.A.4.6.0.3.F.4.B.0.0.E.0.0.0.2.v6.trrp.arpa
IN TXT "80,g6,2000:1234::1"

Which says: encapsulate packets for this destination in IPv6 GRE
tunnel packets and send those packets to 2000:1234::1.


** There is a hitch with the "additional" section that comes back from
the first query. If you've named the DNS server directly under the
TRRP hierarchy then the response will also include "additional"
records which give the A or AAAA records that match the names in the
NS "authority" records. If you've followed normal practice for
in-addr.arpa server naming then those "additional" records won't be
available. Instead, you'll have to pursue another lookup tree to map
the NS server names to IP addresses.

At an operations best-practice level, I expect that the relevant DNS
server names -will- be placed in the TRRP hierarchy so that the next
DNS server's IP address always comes back as an "additional" record.
TRRP doesn't require this but it makes operational sense.


>  so the answer comes back, as you describe (and I only partially
>  understand):
>
>   http://bill.herrin.us/network/trrp-nm.html

This is for situations where a machine will communicate with multiple
nearby addresses. For example, lets say that you're Google and all 253
IP addresses from 199.33.224.2 through 199.33.224.254 will communicate
with you.

When you see a packet for 199.33.224.2, you'll look up
"2.224.33.199.v4.trrp.arpa" and get "80,g4,198.4.5.227 ff,nm,24".

When you see a packet for 199.33.224.3, you'll look up
"3.224.33.199.v4.trrp.arpa" but at the same time you'll also look up
"24.224.nm.33.199.v4.trrp.arpa." because of the ff,nm,24 entry you got
with the lookup for 199.33.224.2. The answer to the second query will
be "80,g4,198.4.5.227". That's your netmask entry and it means that
198.4.5.227 is a valid IPv4 GRE ETR for all of 199.33.224.0/24.

When you see a packet for 199.33.224.4, you'll send it immediately to
198.4.5.227 because you'll already have cached the netmask entry for
"24.224.nm.33.199.v4.trrp.arpa.". You won't have to look up
"4.224.33.199.v4.trrp.arpa" at all because you already have the
answer.



>  Sure, but if a new kind of address space involves a new kind of
>  delay, and you have the choice between this and the original kind of
>  address space which has no such delays, then it is going to be hard
>  to get many or most people to adopt the new kind.  As Marshall
>  Eubanks wrote: http://psg.com/lists/rrg/2008/msg00381.html

Marshall wrote, 'the delays cannot be "significant," which [Marshall]
would regard as meaning "being noticeably larger than typical DNS
delays."'

The delay in TRRP is not noticeably larger than a typical DNS delay.
In fact, it is precisely a typical DNS delay.


>  I don't know as much as I would like about DNS, but I think you are
>  saying that the host sends a request to the nameserver which is
>  authoritative for dirtside.com not just for the address of the
>  nameservers which are authoritative for z.dirtside.com, but asks for
>  the address of the whole thing "0.1 ... .y.z.dirtside.com".

Yes, that's exactly what the DNS resolver does.


>  I don't understand how it could know to do that, unless it would
>  also ask for the whole thing from a nameserver which is
>  authoritative for .com as well.

That's exactly what the DNS resolver does.



>  In your IPv4 scheme, there could be a different authoritative
>  namesever for every /24, for instance:
>
>    101.168.192.v4.trrp.arpa
>
>  That is about (224 - 2) * 256 * 256 = 14,548,992 separate /24s.

Certainly. But the probability is that the nameserver for
192.v4.trrp.arpa knows all 65536 nameservers for the /24's under it.

Because the typical case takes exactly two non-recursive lookups
(given 256 already-cached top levels) to get all the way to the /32
even though you're potentially talking about 14M nameservers from
which to choose, TRRP scales wonderfully.

Regards,
Bill Herrin


-- 
William D. Herrin herrin@dirtside.com bill@herrin.us
3005 Crane Dr. Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg