Obviously, this is a dead end street and I can only observe how this
community walks to its very end.
There however the community might remember that according to Rekther's law
there is also a second way.
Heiner
In einer eMail vom 18.10.2007 08:10:07 Westeuropäische Normalzeit schreibt
rw@firstpr.com.au:
Hi Dan
and Noel,
Dan wrote:
> The issue regarding packet drops with
LISP and LISP-CONS has been > brought up a few times on the
list. > > Basically, packets are dropped for every ITR cache
miss. Since > CONS mapping requests may take a long time to be
satisfied, this > may result in unacceptable service.
LISP-CONS,
LISP-NERD and TRRP use ITRs which cache mapping data and rely on a query
system which spans the Internet to get fresh mapping data.
I think
that dropping packets - or delaying them while queries and responses cross
the planet - is not acceptable for the future architecture of the
Internet.
Each such exchange typically involves multiple packets being
sent and received, with delays and potential losses. Each
mapping request could take a few seconds to resolve under
difficult circumstances, and half to one second under common
circumstances. Folks in US and Europe need to remember that other folks
have longer packet paths - such as 350ms round trip delays from
Melbourne Australia to the Netherlands under close to ideal
circumstances.
DNS-style mapping lookups (LISP-NERD or TRRP) will often
involve multiple queries to find the right nameserver. This is
especially so for IPv6, with the likelihood of having to row through 48
bits or more, 4 bits at a time with TRRP or I think LISP-NERD.
There's no way an ITR could cache all those intermediate nameservers, even
for IPv4.
An ITR might be required to drop a packet and query the
global system in these circumstances:
1 - A uses DNS to find
IP address of B, and B's domain name requires the
query go to a nameserver which uses a LISP-etc. mapped
address.
2 - A is on a mapped address too, so for each request
to a nameserver in the above query, the nameserver's
ITR needs to do a LISP/TRRP CAR-CDR/DNS-like lookup to
find the ETR by which A can be reached.
3 - A's ITR needs to do another mapping lookup before A can
send a packet to B.
4 - B's response requires another
lookup, since A is on a mapped address
too.
Point 1 and 2 can occur in pairs for as many depths of
nameserver recursion as is required to get an answer. For instance
five or so times to find the IP address of "aa.bb.cc.ee.ff.gg".
At
every such query, the LISP or TRRP ITR drops the packet and the higher
layers have to time out and retry. Here is an example of establishing
a TCP link to a host which is identified by a four level domain
www.xxx.com.au.
The client host A is on a LISP-mapped address, as are
the nameservers for .com.au and .xxx.com.au.
We assume the client
host A has cached the address of the .au nameserver, (this would not be the
case for the less used TLDs) but not of the nameserver for .com.au.
Likewise, we assume A's LISP-ITR initially has no mapping for the address
of the .com.au nameserver.
This is perhaps unrealistic, since we
wouldn't expect the .com.au nameservers to be on LISP-mapped
addresses. However the following example is valid for a webserver of
a department of a company or university, such as:
www.astro.someuni.ch www.sales.wizmo.es
where these
organisations use mapped addresses for their networks. This is especially
the case in the future when LISP/Ivip-etc.-mapped address space is likely
to be very widely used.
For instance maybe someuni.ch runs one of its
nameservers on its own LISP-etc.-mapped network and a second one on another
university's LISP-etc.-mapped network - but the astronomy department has
a separate campus with one of its nameservers on another portion
of LISP-etc.-mapped address space etc.
A -X->
ns1.com.au LISP ITR drops the 1st packet.
A times out. How
long does the first time-out
take?
A --->
ns1.com.au A sends a 2nd packet, which the ITR now has
mapping for
(ideally - but maybe the ITR is
still awaiting a response from the
global
CAR-CDR or DNS-like query system) and so
tunnels to the ETR
which serves ns1.com.au.
A <-X- ns1.com.au The LISP ITR
near ns1.com.au drops the packet
- it has no mapping for A's
address.
A times out again. How long does the second
time-out
take?
A ---> ns1.com.au A sends a third
packet.
Or has A given up on ns1 and tries to send
a query to
ns2.com.au? This is on a totally
different network, and so we REPEAT
(not
shown here) all the above steps.
Now (ideally) the LISP
ITR near A has mapping
information for nsx.com.au - so finally,
A
gets its query to nsx.com.au.
A <---
nsx.com.au Ideally, the nameserver's ITR has mapping for
A by now,
so the packet is tunneled to A's
ETR and reaches A.
Repeat the same stuff (as detailed below) so that A
can find out the address of the
nameserver for xxx.com.au.
A -X-> ns1.xxx.com.au A's 1st
packet is dropped by the LISP ITR.
A times
out.
A ---> ns1.xxx.com.au A sends a 2nd
packet.
A <-X- ns1.xxx.com.au LISP ITR near
ns1.xxx.com.au drops packet.
A times out
again.
A ---> ns1.xxx.com.au A sends a third
packet.
A <--- nsx.xxx.com.au A receives the IP address
of
www.xxx.com.au.
Now a similar pattern of dropped packets and time-outs
so that A establishes a TCP session with the web
server.
The web server isn't
necessarily on the same LISP-mapped
address space as the nameservers. It could be at a hosting
company, which uses LISP-mapped space so it can move
its upstream connections to other ISPs
without all its customers having to
change their DNS entries for their web
servers.
A -X-> www.xxx.com.au A's 1st packet dropped by
LISP ITR.
A times out.
A --->
www.xxx.com.au A sends a 2nd packet. By now, ideally
A's ITR has the mapping data.
A <-X-
www.xxx.com.au LISP ITR near www.xxx.com.au drops
packet.
A times out again. www.xxx.com.au
has
a half-open TCP session dangling . . .
A
---> www.xxx.com.au A sends a third packet. The
webserver
half-opens a second TCP session.
A
<--- www.xxx.com.au A gets its TCP acknowledgement
and
sends its response . . .
A --->
www.xxx.com.au . . . which (ideally) will be tunneled
immediately.
This is arguably worse than some common circumstances, but
it is easy to think of more difficult cases, with a greater recursion
of nameservers. Any packets dropped en-route make things still
worse.
This is:
16 packets sent (not counting flurry of
ITR query traffic)
6 dropped packets
2
1st time-outs by A's name lookup code.
2 2nd time-outs by
A's name lookup code.
1 1st time-out by A's TCP session
establishment code.
1 2nd time-out by A's TCP session
establishment code.
With eFIT-APT or Ivip, there are no dropped
packets and we have:
A ---> ns1.com.au A <---
nsx.com.au
A ---> ns1.xxx.com.au A <---
nsx.xxx.com.au
A ---> www.xxx.com.au A <---
www.xxx.com.au A ---> www.xxx.com.au
7 packets
sent
0 dropped packets
0
time-outs
In this example, LISP or TRRP typically makes the user and
higher level protocols wait for 6 time-out periods. I think this is
really unacceptable.
The delay would be less with only one level of
DNS recursion, and if only one end of each exchange using LISP-mapped
address space.
If something like LISP or TRRP was introduced, I think
people would soon find that the address space it handles sucks - due to
the common experience of excessive delays in establishing sessions
of any kind. It would be very much harder to convince people to
adopt this space if it means a permanent degradation of their
own experience and of every person who tries to communicate with
them.
> Suggestions have been made to route packets on the old
topology in > the event of ITR cache misses. However, this leads to a
major > incremental deployment issue -- since LISP adopters will
still > need to maintain their routes in the old topology, there would
be > no reduction in the size of the global routing table.
I
agree there would be no absolute reduction, but there could and should
still be less BGP prefixes than if LISP etc. was not introduced. I
discuss this below in my response to Noel.
> I have not seen any
other suggestions on how to handle this issue. > Could this be a
fundamental problem with the design, or are there > other
solutions?
It is a fundamental problem with having caching ITRs which
can't let a packet go to another ITR which has the full database, and
where the caching ITRs rely on a global-sized - and therefore slow
and unreliable - query system.
Eliot Lear proposed (RRG messages
leading to 264 & 288) that LISP-NERD could have one or more ITRs well
outside the sending host's ISP's network to catch those packets which are
not caught by an ITR in that network. I think this amounts to the
same thing as Ivip's "multicast ITRs in the core". I think this was
meant as a means of making LISP-NERD incrementally deployable.
Perhaps, if the LISP-NERD ITRs were modified to pass the packets they
couldn't tunnel, this would mean that the packets would be
tunnelled immediately, rather than dropped.
In that case, LISP-NERD
would resemble eFIT-APT and Ivip in having some caching ITRs which could
let packets go through to a full database ITR (Default Mapper for eFIT-APT,
or ITRD for Ivip) when the caching ITR didn't have the mapping
information. Perhaps with Eliot's suggestion, those "full database"
ITRs are only "full-database" for one BGP-prefix's range of destination
addresses, and perhaps there is only one such ITR for each such prefix,
rather than necessarily multiple of them using anycast.
As it
stands, here is the situation as I understand it. (Please check later
in this thread for corrections):
LISP-CONS LISP-NERD eFIT-APT
Ivip TRRP
Full DB ITRs? No
No Default
ITRD No
Mappers
Caching ITRs?
All All ITR
ITRC All
ITFH
Local query No
No Default
QSD No servers for
Mappers QSC caching ITRs?
Caching ITR must
Yes Yes No
No Yes drop packets it has no mapping
for?
Distribution of Pull
Pull Push Push
Pull database to CAR-CDR
DNS-like slow fast
DNS-like networks with global
global via Repli- global ITRs
etc.? network network
BGP cator network
system
Incremental
RRG
Yes deployment via
messages "anycast ITRs in
264 & 288 the
core"?
TRRP also has a method by which some ITRs (depending on how
directly they queried the authoritative DNS-like mapping information
servers) get push notification of changed mapping from the
authoritative mapping servers. However I can't see how this would
scale well.
eFIT-APT and Ivip are the only two proposals in which all
packets are tunneled without delay.
Noel Chiappa wrote, in
part:
DJ> Suggestions have been made to route packets on the old
topology in the DJ> event of ITR cache misses. However, this leads to
a major incremental DJ> deployment issue -- since LISP adopters will
still need to maintain DJ> their routes in the old topology, there would
be no reduction in the DJ> size of the global routing table. >
> Well, it's not just an incremental issue; either i) we always
maintain the > backup routing (in which case we don't get the size
reductions, as you point > out),
I disagree with this prevalent
notion that maintaining a prefix in BGP while that prefix is handled by an
ITR-ETR mapping scheme will not result in reductions in the size of the
global BGP routing table.
This would only be true if every such prefix
was used to serve a single end-user network. The major benefit of all
these ITR-ETR schemes is that they can slice and dice the address space
much finer than BPG, which is limited (for all practical purposes in
the foreseeable future) to 256 IP address chunks in IPv4.
If
end-user networks all needed more than 128 IP addresses, then it might be
true that the ITR-ETR mapping systems can't help use IPv4 address space
more efficiently.
However I believe there are a large and growing
number of end-user networks which require multihoming and/or address space
which doesn't change when a new ISP is used. ("Serial multihoming"
as Iljitsch wrote - though call this "portable address space",
with apologies to those whose teeth itch at the mention of this
phrase.)
On this list a few weeks ago, several other people supported
my view that there is a significant number of end-user networks with
smaller (than 128, I guess) IPv4 address requirements who would be
well served by LISP/eFIT-APT/Ivip/TRRP-mapped space, and who
would contribute to the growth of the global BGP routing table if such
an ITR-ETR mapping system was not introduced.
> or ii) we
don't maintain them into the indefinite future, in which case > the
speedup of using the backup routing would not be available in the
future.
As long as Ivip or whatever achieves benefits by enabling lots
more end-user networks to operate as they require without each
one getting their own BGP-advertised prefix, then I think there's
no need to worry about maintaining the larger (shorter) prefixes in
BGP when they are mapped by LISP etc.
> But since I think we
can make the resolution adequately fast, I don't think > there's a
problem here.
You could probably create a complex combination of
caching and genuine push of update messages to make the CAR-CDR network
faster.
The question is whether it would be easier to create a complex
and souped up CAR-CDR system than to establish a clean -
although ambitious - system to distribute mapping data globally, within a
few seconds, as I am planning with Ivip.
My current description of
the Replicator system is broad, so it is hard to make comparisons of
difficulty and cost now. Over the next few months I hope to write up
some more elegant, secure and concrete details.
-
Robin
|