[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: draft-aruns-ccamp-rsvp-restart-ext-00
Hi Nic,
> I've just read your draft-aruns-ccamp-rsvp-restart-ext-00 and it looks good.
> In particular, we've been looking at using Restart for Fast Reroute LSPs for
> some time and this draft provides everything that is needed (like recovering
> the FAST_REROUTE, DETOUR, SENDER_TEMPLATE and ERO
> objects from the downstream node when they are not available from upstream).
Good. This concern was also raised in Seoul, and I am pleased to hear that the draft
addresses these requirements.
> However, I have a couple of concerns (not related to Fast Reroute).
>
> - Your draft doesn't tackle, and won't work for, simultaneous restart of
> adjacent nodes. This is a problem that is tackled by
> draft-rahman-ccamp-rsvp-restart-extensions, so merging the two drafts in
> some way may be the best way to resolve that. I realize that the Aruns
> draft aims to make Restart possible for nodes which cannot retrieve state
> from the data plane, and in that case recovering from simultaneous restart
> of adjacent nodes isn't easy. I think including some further extensions for
> nodes which can retrieve some state from the data plane would be
> appropriate.
Retrieving state from the data plane only answers half of the problem. However, it is
certainly important to audit the recovered control plane information against the known
data plane state.
With regard to adjacent node failures and restarts, I believe there are actually
sufficient capabilities here. Perhaps the authors would like to include text to clarify
the procedures.
> - The back compatibility with RFC 3473 restart looks risky. Draft Aruns
> mandates that restarted nodes don't send Path Refreshes until either the
> recovery period expires or a RecoveryPath is received from downstream. In
> the case that the downstream node only supports RFC 3473 restart (and so
> doesn't send RecoveryPaths), it may well timeout Path state at the same time
> as or very soon after the recovery period expires. Hence a dangerous timing
> window is created.
You have something here.
However, section 9.5.3 of RFC3473 does not say that the neighbor MUST discard state that
is not restored in the recovery time interval. Presumably it would simply recommence
waiting for state refresh and so would time out after a 3.5 refresh intervals from the end
of the recovery interval.
Some compromise may be introduced here by noting that 3473 says that Path state SHOULD be
restored within 1/2 of the recovery time. So we could follow this logic and use the first
half of the time interval for the RecoveryPath message and the second half for backwards
compatible recovery.
On the other hand, I would prefer that this new capability (support for RecoveryPath
message) was signaled in the Restart_Capabilities object so that the restarting node can
know whether to expect to receive a RecoveryPath or not.
> As a potential solution to both problems I'd suggest that a restarting node
> receiving a Path message with a recovery label should always forward it
> immediately as well as it can, and include both a recovery label and (for
> back compatibility) a suggested label. Similarly, it should forward
> RecoveryPath messages immediately as well as it can. I'd be happy to
> discuss any of this further.
This sounds very dangerous.
"As well as it can" may include path computation which may pick a path other than the one
previously in use. Hence the new Path message will be sent to a new neighbor. This
disaster is no better than the problem we are trying to solve.
Cheers,
Adrian