[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RRG] LISP-NERD reachability and MTU detection
- To: Routing Research Group list <rrg@psg.com>
- Subject: [RRG] LISP-NERD reachability and MTU detection
- From: Iljitsch van Beijnum <iljitsch@muada.com>
- Date: Sat, 15 Dec 2007 17:53:25 +0100
After Eliot's presentation today, I started thinking that LISP-NERD
could benefit greatly from something like the shim6 REAP reachability
evaluation mechanism. However, with shim6 we have the limitation that
we have no bits to play with for datapackets belonging to sessions
that haven't encountered any failures. Not so with NERD: here we have
additional bits in every packet. I'm assuming that we can use a few of
those for reachability and MTU detection. It would work like this:
A LISP-NERD ITR chooses an ETR/locator and assumes it's reachable. It
sets a "please respond" code point in the NERD header and starts a
timer. The ETR receving the packet sees the "please respond" message
and sends back info to the originating ITR that could encompass
current locator preference information (for traffic engineering), the
up/down status of other ETRs, the maximum packet size the ETR is
prepared to receive, possibly (if the ETR supports reassembly) the
maximum packet size the ETR is prepared to reassemble from fragments.
The ITR receives the message and updates its mapping cache accordingly.
If there is no response before the timer expires, the ITR switches to
a different ETR.
When mapping state is created and outgoing traffic is flowing, the ITR
may observe return traffic (if the same ITR and ETR function as ETR
and ITR, respectively, for traffic in the other direction) and deduce
that there is adequate reachability. If the ITR doesn't see any return
traffic, on the other hand, it sets the "please respond" code point in
the LISP header periodically and awaits replies. Again, if none are
forthcoming, it switches to another ETR.
Because LISP packets contain a nonce, ITRs can correlate incoming
responses to their response requests with the original packets, so
they are in the position to do RFC 4821 path MTU discovery without the
help from ICMP messages. (They may be limited somewhat because they
can't decide on the packet size on their own unless we add extra stuff
here.)
In my book, this is a big win, because it means that the ETRs can be
completely stateless so it's easy for ISPs to run them for their
customers and on the ITRs the state required for reachability
detection is extemely basic: simpler than shim6 and nowhere near
what's in TCP. It's also soft state that can be discarded and
recreated without penalty when all ETRs for a prefix (or at least the
one the ITR will be selecting as the one to use the next time around)
are up.
If an ITR notices that there is reachability for small packets, it can
then keep a copy of a large packet that it sends with reply requested,
and if there is no reply, or if there is an ICMP packet too big, it
can generate an ICMP too big towards the source of the original
packet. It doesn't actually know the packet size that can be used on
the link in the former case, but it could use heuristics or stick to a
conservative value.
It would of course also be possible to advertise ETR MTU sizes in the
mapping database (but that doesn't tell us the path MTU).
The question is whether we want to support ITRs and/or ETRs sitting in
front of / behind 1500 byte MTU links. Having ITRs with this
limitation is probably doable because as per the above, the ITRs
SHOULD be able to generate the too big messages back to the source
hosts and if end-users deploy ITRs in their own networks, they'll
quickly discover that they'll have to un-break PMTUD. For ISPs we'll
probably have to mandate the larger packet size because the
correlation between the deployment of a new box and the start of
problems won't be as obvious or easy to reverse.
For ETRs, having an incoming MTU of 1500 means that unacceptable PMTUD
blackholes will happen, or ITRs have to fragment packets and the ETR
has to reassemble them (for DF=1 or IPv=6). I'm assuming this is
unacceptable but I'm certainly interested to hear from vendors about
this.
--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg