[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Shim6 failure recovery after garbage collection



In a private discussion at IETF in Dallas, I was discussing with someone
the impact on a content provider's server of implementing shim6.  I
expressed the opinion that it would be nice if a heavily loaded server
could aggressively garbage collect shim6 state after initial context
establishment, and rely on the client to perform failure and reachability
detection and initiate context re-establishment if a failure is detected.

If such behavior is supported by the protocol, we're much more likely to
be able to convince potential implementors to turn on basic shim6 support
by default, with the understanding that implementors can aggressively
discard shim6 context state (and not initiate shim6 context negotiation)
if the implementation doesn't have multiple locators or otherwise doesn't
need the capabilities shim6 provides.  This would (properly, IMO) push the
responsibility for context tracking, failure detection and reachability
exploration to the multihomed host that stands to benefit most from shim6
and wants to run it in the first place.

My understanding after re-reading of the protocol, particularly the
R1bis/I2bis message exchange sections, is that context re-establishment
can occur (within approx. 2 RTTs of failure detection) even if the
original ULID-ULID connection has failed and one side has discarded the
context state.  Does that match others' understanding of the process?

I was also concerned whether REAP (as defined in
draft-ietf-shim6-failure-detection-03.txt) would require frequent context
re-establishment if one side were to garbage collect.  However, if I
assume that TCP ACKs are considered "payload" packets in the context of
resetting the REAP keepalive timer, then that is not an issue, because
that timer will only be started upon sending a TCP data packet, and will
be reset by that data packet's ACK.

In any event, I think that draft-ietf-shim6-failure-detection-03.txt needs
to better define "payload packet" to differentiate between data packets
(which IMO should both set and reset the keepalive timer) and ACK packets
(which IMO should only reset the keepalive timer, not start it, or it will
cause unnecessary context re-establishment).

-Scott