[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RRG] LISP-NERD reachability and MTU detection



After Eliot's presentation today, I started thinking that LISP-NERD could benefit greatly from something like the shim6 REAP reachability evaluation mechanism. However, with shim6 we have the limitation that we have no bits to play with for datapackets belonging to sessions that haven't encountered any failures. Not so with NERD: here we have additional bits in every packet. I'm assuming that we can use a few of those for reachability and MTU detection. It would work like this:

A LISP-NERD ITR chooses an ETR/locator and assumes it's reachable. It sets a "please respond" code point in the NERD header and starts a timer. The ETR receving the packet sees the "please respond" message and sends back info to the originating ITR that could encompass current locator preference information (for traffic engineering), the up/down status of other ETRs, the maximum packet size the ETR is prepared to receive, possibly (if the ETR supports reassembly) the maximum packet size the ETR is prepared to reassemble from fragments.

The ITR receives the message and updates its mapping cache accordingly.

If there is no response before the timer expires, the ITR switches to a different ETR.

When mapping state is created and outgoing traffic is flowing, the ITR may observe return traffic (if the same ITR and ETR function as ETR and ITR, respectively, for traffic in the other direction) and deduce that there is adequate reachability. If the ITR doesn't see any return traffic, on the other hand, it sets the "please respond" code point in the LISP header periodically and awaits replies. Again, if none are forthcoming, it switches to another ETR.

Because LISP packets contain a nonce, ITRs can correlate incoming responses to their response requests with the original packets, so they are in the position to do RFC 4821 path MTU discovery without the help from ICMP messages. (They may be limited somewhat because they can't decide on the packet size on their own unless we add extra stuff here.)

In my book, this is a big win, because it means that the ETRs can be completely stateless so it's easy for ISPs to run them for their customers and on the ITRs the state required for reachability detection is extemely basic: simpler than shim6 and nowhere near what's in TCP. It's also soft state that can be discarded and recreated without penalty when all ETRs for a prefix (or at least the one the ITR will be selecting as the one to use the next time around) are up.

If an ITR notices that there is reachability for small packets, it can then keep a copy of a large packet that it sends with reply requested, and if there is no reply, or if there is an ICMP packet too big, it can generate an ICMP too big towards the source of the original packet. It doesn't actually know the packet size that can be used on the link in the former case, but it could use heuristics or stick to a conservative value.

It would of course also be possible to advertise ETR MTU sizes in the mapping database (but that doesn't tell us the path MTU).

The question is whether we want to support ITRs and/or ETRs sitting in front of / behind 1500 byte MTU links. Having ITRs with this limitation is probably doable because as per the above, the ITRs SHOULD be able to generate the too big messages back to the source hosts and if end-users deploy ITRs in their own networks, they'll quickly discover that they'll have to un-break PMTUD. For ISPs we'll probably have to mandate the larger packet size because the correlation between the deployment of a new box and the start of problems won't be as obvious or easy to reverse.

For ETRs, having an incoming MTU of 1500 means that unacceptable PMTUD blackholes will happen, or ITRs have to fragment packets and the ETR has to reassemble them (for DF=1 or IPv=6). I'm assuming this is unacceptable but I'm certainly interested to hear from vendors about this.

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg