[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RRG] LISP-NERD reachability and MTU detection
On 16 dec 2007, at 19:23, Dino Farinacci wrote:
A LISP-NERD ITR chooses an ETR/locator and assumes it's reachable.
It sets a "please respond" code point in the NERD header and starts
a timer. The ETR receving the packet sees the "please respond"
message and sends back info to the originating ITR
Iljitsch, you really don't to do this. In a very well behaving
scenario, this will cause way too much control traffic going to the
site. Before we put in the loc-reach-bits into LISP this was an
obvious first thought but was discarded because polling typically
doesn't scale well when being polled from a million places. Could
you imagine if everyone polled the DNS root servers *in addition to*
sending queries for name translations!
Well, it's one thing or the other: either you make available
information about the currently reachable locator set for any given
EID in the mapping system = the mapping system is just as volatile as
BGP, or you do reachability testing of some kind between ITRs and
ETRs. I'd say that if you can talk to millions of places, you can also
respond to reachability probes from millions of places. However, this
does suggest that something LISP-like isn't the best choice for the
highest traffic destinations connected to the internet.
Why not just use TTL timeouts and have the ITR, when it needs the
mapping, send a query or Data Probe and get a new Reply back with
updated information?
I'm not sure how long you want to make these TTLs, but obviously that
would be longer than a 10 second or so polling interval which means
that it's going to take you much longer to detect and recover from
failures.
And to be clear, what problem are you trying to solve? Are you
trying to get EID-to-RLOC record changes to ITRs as soon as possible?
When I talk to an ETR, I want to know if it's alive fast enough to
fail over to another one before the transport, app or user stops trying.
Are you trying to get each ITR a copy of their own mapping with
different preference and weight information contained in the record?
If ETRs are sending packets to ITRs anyway it makes sense to add this
information, yes. Not sure if I'd want to work very hard to make it
available if it wasn't "free" to do so, though.
If there is no response before the timer expires, the ITR switches
to a different ETR.
This is polling for reachability. I don't think you can do much
better than what we have spec'ed out in the main LISP spec with the
loc-reach-bits design.
You assume that each ETR knows whether the other ETRs for that EID are
reachable. On friday, Eliot said he'd like to see ETR functionality in
Linksys boxes, so if a user has both cable and DSL connectivity (we
called this a "basement multihomer" in multi6) the one box would
handle RLOCs from both so this requirement is easy to meet. However,
if ISPs run ETRs for the benefit of their customers, this means that a
single ETR needs to keep track of very many other ETRs in different
administrative domains and this wouldn't work very well. Also, the
fact that ETR A can reach ETR B doesn't mean that a given ITR can also
reach it, especially if ETRs are located at end-user sites where last
kilometer and POP outages will happen rather than in ISP datacenters
with multiple connections where uptimes are high.
For ETRs, having an incoming MTU of 1500 means that unacceptable
PMTUD blackholes will happen, or ITRs have to fragment packets and
the ETR has to reassemble them (for DF=1 or IPv=6). I'm assuming
this is unacceptable but I'm certainly interested to hear from
vendors about this.
It's amazing how people are so fascinated with this MTU issue. So
what's wrong with fragmentation?
It's basically part of our internet engineering taboos at this stage,
just like packet reordering. See http://citeseer.ist.psu.edu/335647.html
As long as the ITRs fragment before encapsulation, the host and not
the ETR will reassemble.
You can't fragment IPv6 packets or IPv4 packets with DF=1. So what
would have to happen is that the LISP tunnel does the fragmenting/
reassembly. In this case it would be helpful to know what size packets
an ETR is willing to reassemble.
I think the reason why Fred is "fascinated" with the MTU issue is
because he's been trying to solve this for the general tunneling case
and has found that to be quite hard. For me, I've often been bitten by
PMTUD black holes, both as a user and as someone who had to make ISP
infrastructure work without overloading the support lines. In
addition, I would very much like us to move towards a situation where
1500 bytes is no longer the de facto IP MTU but people can use
something larger if their hardware supports it.
We have both the potential to do very quite things (trigger broken
PMTUD) and very useful things (give people an incentive to deploy
jumboframes, create the first MTU-robust tunneling mechanism) here so
we should aim to get things right the first time rather than repeat
the mistakes made with RFC 1191.
--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg