[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] LISP-NERD reachability and MTU detection



On 16 dec 2007, at 19:23, Dino Farinacci wrote:

A LISP-NERD ITR chooses an ETR/locator and assumes it's reachable. It sets a "please respond" code point in the NERD header and starts a timer. The ETR receving the packet sees the "please respond" message and sends back info to the originating ITR

Iljitsch, you really don't to do this. In a very well behaving scenario, this will cause way too much control traffic going to the site. Before we put in the loc-reach-bits into LISP this was an obvious first thought but was discarded because polling typically doesn't scale well when being polled from a million places. Could you imagine if everyone polled the DNS root servers *in addition to* sending queries for name translations!

Well, it's one thing or the other: either you make available information about the currently reachable locator set for any given EID in the mapping system = the mapping system is just as volatile as BGP, or you do reachability testing of some kind between ITRs and ETRs. I'd say that if you can talk to millions of places, you can also respond to reachability probes from millions of places. However, this does suggest that something LISP-like isn't the best choice for the highest traffic destinations connected to the internet.

Why not just use TTL timeouts and have the ITR, when it needs the mapping, send a query or Data Probe and get a new Reply back with updated information?

I'm not sure how long you want to make these TTLs, but obviously that would be longer than a 10 second or so polling interval which means that it's going to take you much longer to detect and recover from failures.

And to be clear, what problem are you trying to solve? Are you trying to get EID-to-RLOC record changes to ITRs as soon as possible?

When I talk to an ETR, I want to know if it's alive fast enough to fail over to another one before the transport, app or user stops trying.

Are you trying to get each ITR a copy of their own mapping with different preference and weight information contained in the record?

If ETRs are sending packets to ITRs anyway it makes sense to add this information, yes. Not sure if I'd want to work very hard to make it available if it wasn't "free" to do so, though.

If there is no response before the timer expires, the ITR switches to a different ETR.

This is polling for reachability. I don't think you can do much better than what we have spec'ed out in the main LISP spec with the loc-reach-bits design.

You assume that each ETR knows whether the other ETRs for that EID are reachable. On friday, Eliot said he'd like to see ETR functionality in Linksys boxes, so if a user has both cable and DSL connectivity (we called this a "basement multihomer" in multi6) the one box would handle RLOCs from both so this requirement is easy to meet. However, if ISPs run ETRs for the benefit of their customers, this means that a single ETR needs to keep track of very many other ETRs in different administrative domains and this wouldn't work very well. Also, the fact that ETR A can reach ETR B doesn't mean that a given ITR can also reach it, especially if ETRs are located at end-user sites where last kilometer and POP outages will happen rather than in ISP datacenters with multiple connections where uptimes are high.

For ETRs, having an incoming MTU of 1500 means that unacceptable PMTUD blackholes will happen, or ITRs have to fragment packets and the ETR has to reassemble them (for DF=1 or IPv=6). I'm assuming this is unacceptable but I'm certainly interested to hear from vendors about this.

It's amazing how people are so fascinated with this MTU issue. So what's wrong with fragmentation?

It's basically part of our internet engineering taboos at this stage, just like packet reordering. See http://citeseer.ist.psu.edu/335647.html

As long as the ITRs fragment before encapsulation, the host and not the ETR will reassemble.

You can't fragment IPv6 packets or IPv4 packets with DF=1. So what would have to happen is that the LISP tunnel does the fragmenting/ reassembly. In this case it would be helpful to know what size packets an ETR is willing to reassemble.

I think the reason why Fred is "fascinated" with the MTU issue is because he's been trying to solve this for the general tunneling case and has found that to be quite hard. For me, I've often been bitten by PMTUD black holes, both as a user and as someone who had to make ISP infrastructure work without overloading the support lines. In addition, I would very much like us to move towards a situation where 1500 bytes is no longer the de facto IP MTU but people can use something larger if their hardware supports it.

We have both the potential to do very quite things (trigger broken PMTUD) and very useful things (give people an incentive to deploy jumboframes, create the first MTU-robust tunneling mechanism) here so we should aim to get things right the first time rather than repeat the mistakes made with RFC 1191.

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg