[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] LISP-NERD reachability and MTU detection

To: Iljitsch van Beijnum <iljitsch@muada.com>
Subject: Re: [RRG] LISP-NERD reachability and MTU detection
From: Dino Farinacci <dino@cisco.com>
Date: Mon, 17 Dec 2007 15:01:39 -0800
Cc: Routing Research Group list <rrg@psg.com>
In-reply-to: <DDD842B7-F8A4-4227-A5B0-902A1F23CE65@muada.com>
References: <EAB3BF96-D438-459E-A753-F9D72B1FE5B6@muada.com> <EC1BB972-6F21-4CBC-B827-BB1840C25AE8@cisco.com> <DDD842B7-F8A4-4227-A5B0-902A1F23CE65@muada.com>

A LISP-NERD ITR chooses an ETR/locator and assumes it's reachable.It sets a "please respond" code point in the NERD header andstarts a timer. The ETR receving the packet sees the "pleaserespond" message and sends back info to the originating ITR
Iljitsch, you really don't to do this. In a very well behavingscenario, this will cause way too much control traffic going to thesite. Before we put in the loc-reach-bits into LISP this was anobvious first thought but was discarded because polling typicallydoesn't scale well when being polled from a million places. Couldyou imagine if everyone polled the DNS root servers *in additionto* sending queries for name translations!
Well, it's one thing or the other: either you make availableinformation about the currently reachable locator set for any givenEID in the mapping system = the mapping

Send packets to the destination site with LISP and your syn-ack packetreturned will give you reachability information.

I'm not sure how long you want to make these TTLs, but obviouslythat would be longer than a 10 second or so polling interval whichmeans that it's going to take you much longer to detect and recoverfrom failures.
And to be clear, what problem are you trying to solve? Are youtrying to get EID-to-RLOC record changes to ITRs as soon as possible?
When I talk to an ETR, I want to know if it's alive fast enough tofail over to another one before the transport, app or user stopstrying.

There will be enough traffic that will hash over to all ETRs at thesite. And all ITRs at your site will get the reachability information.

Are you trying to get each ITR a copy of their own mapping withdifferent preference and weight information contained in the record?
If ETRs are sending packets to ITRs anyway it makes sense to addthis information, yes. Not sure if I'd want to work very hard tomake it available if it wasn't "free" to do so, though.

But it doesn't change often enough to poll for it. Tell me what youthink is wrong with the LISP mechanisms?

If there is no response before the timer expires, the ITR switchesto a different ETR.
This is polling for reachability. I don't think you can do muchbetter than what we have spec'ed out in the main LISP spec with theloc-reach-bits design.
You assume that each ETR knows whether the other ETRs for that EIDare reachable. On

That is a good thing to assume and a goal to strive for.

friday, Eliot said he'd like to see ETR functionality in Linksysboxes, so if a user has both cable and DSL connectivity (we calledthis a "basement multihomer" in multi6) the one box would handleRLOCs from both so this requirement is easy to meet. However, ifISPs run ETRs for the benefit of their customers, this means that asingle ETR needs to keep track of very many other ETRs in differentadministrative domains and this wouldn't work very well.

You changed configurations all in two sentences. Is the xTR at thesite or at the ISP, pick one and let's stay coherent with the usagecase. Or else, I can't follow you.

For ETRs, having an incoming MTU of 1500 means that unacceptablePMTUD blackholes will happen, or ITRs have to fragment packets andthe ETR has to reassemble them (for DF=1 or IPv=6). I'm assumingthis is unacceptable but I'm certainly interested to hear fromvendors about this.
It's amazing how people are so fascinated with this MTU issue. Sowhat's wrong with fragmentation?
It's basically part of our internet engineering taboos at thisstage, just like packet reordering. See http://citeseer.ist.psu.edu/335647.html

There is no difference if the host sends smaller packets or a box inthe network sends smaller packets. I am not advocating that the ETRreassemble here. I want to make that clear.

As long as the ITRs fragment before encapsulation, the host and notthe ETR will reassemble.
You can't fragment IPv6 packets or IPv4 packets with DF=1. So whatwould have to happen is

Right, you have to obey the protocol spec. So packets will get droppedwith DF=1. And people turn off ICMP messages as well.

So what's the difference if packets get lost doing a mapping lookup(everyone is so sensitive to packet drops there) but for MTU discoverypurposes it's okay to drop packets?

So the thinking and reasoning is not consistent (not saying yoursisn't but the general feeling of the list).

that the LISP tunnel does the fragmenting/reassembly. In this caseit would be helpful to know what size packets an ETR is willing toreassemble.

And for IPv6, as long as the ITR's address is the source, it canfragment the packet. But in this cast the ETR must reassemble. Too badfor IPv6. We'll have to translate for IPv6 then and introduce a newset of problems.

I think the reason why Fred is "fascinated" with the MTU issue isbecause he's been trying to solve this for the general tunnelingcase and has found that to be quite hard. For me, I've often beenbitten by PMTUD black holes, both as a user and as someone who hadto make ISP infrastructure work without overloading the supportlines. In addition, I would very much like us to move towards asituation where 1500 bytes is no longer the de facto IP MTU butpeople can use something larger if their hardware supports it.

Do you think 1500 byte MTU links will still be around say 5 years fromnow? Maybe it's time to clean up some links on the network. I'm surevendors can provide incentive to do this. ;-)

We have both the potential to do very quite things (trigger brokenPMTUD) and very useful things (give people an incentive to deployjumboframes, create the first MTU-robust tunneling mechanism) hereso we should aim to get things right the first time rather thanrepeat the mistakes made with RFC 1191.

When you think it is right, it will change. It's been a continualmoving target with multiple moving parts for 20 years. You can neverbe right.

Dino




--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg

Follow-Ups:
- Re: [RRG] LISP-NERD reachability and MTU detection
  - From: Tony Li <tli@cisco.com>

References:
- [RRG] LISP-NERD reachability and MTU detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- Re: [RRG] LISP-NERD reachability and MTU detection
  - From: Dino Farinacci <dino@cisco.com>
- Re: [RRG] LISP-NERD reachability and MTU detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>

Prev by Date: Re: [RRG] LISP-NERD reachability and MTU detection
Next by Date: RE: [RRG] LISP-NERD reachability and MTU detection
Previous by thread: Re: [RRG] LISP-NERD reachability and MTU detection
Next by thread: Re: [RRG] LISP-NERD reachability and MTU detection
Index(es):
- Date
- Thread