Hi all, Comments inline. Adrian Farrel wrote: It's a good idea for the restarting node to know whether it should expect RecoveryPath messages. There doesn't seem to be any room in the Restart_Cap object for this info, so looks like we'd need an extension.Hi Nic,I've just read your draft-aruns-ccamp-rsvp-restart-ext-00 and it looks good. In particular, we've been looking at using Restart for Fast Reroute LSPs for some time and this draft provides everything that is needed (like recovering the FAST_REROUTE, DETOUR, SENDER_TEMPLATE and ERO objects from the downstream node when they are not available from upstream).Good. This concern was also raised in Seoul, and I am pleased to hear that the draft addresses these requirements.However, I have a couple of concerns (not related to Fast Reroute). - Your draft doesn't tackle, and won't work for, simultaneous restart of adjacent nodes. This is a problem that is tackled by draft-rahman-ccamp-rsvp-restart-extensions, so merging the two drafts in some way may be the best way to resolve that. I realize that the Aruns draft aims to make Restart possible for nodes which cannot retrieve state from the data plane, and in that case recovering from simultaneous restart of adjacent nodes isn't easy. I think including some further extensions for nodes which can retrieve some state from the data plane would be appropriate.Retrieving state from the data plane only answers half of the problem. However, it is certainly important to audit the recovered control plane information against the known data plane state. With regard to adjacent node failures and restarts, I believe there are actually sufficient capabilities here. Perhaps the authors would like to include text to clarify the procedures.- The back compatibility with RFC 3473 restart looks risky. Draft Aruns mandates that restarted nodes don't send Path Refreshes until either the recovery period expires or a RecoveryPath is received from downstream. In the case that the downstream node only supports RFC 3473 restart (and so doesn't send RecoveryPaths), it may well timeout Path state at the same time as or very soon after the recovery period expires. Hence a dangerous timing window is created.You have something here. However, section 9.5.3 of RFC3473 does not say that the neighbor MUST discard state that is not restored in the recovery time interval. Presumably it would simply recommence waiting for state refresh and so would time out after a 3.5 refresh intervals from the end of the recovery interval. Some compromise may be introduced here by noting that 3473 says that Path state SHOULD be restored within 1/2 of the recovery time. So we could follow this logic and use the first half of the time interval for the RecoveryPath message and the second half for backwards compatible recovery. On the other hand, I would prefer that this new capability (support for RecoveryPath message) was signaled in the Restart_Capabilities object so that the restarting node can know whether to expect to receive a RecoveryPath or not. I would also like to see a mechanism where each node could indicate on a per-LSP basis (e.g. at setup time) whether it would want the RecoveryPath message for that LSP after it restarts. Regards, Reshad. As a potential solution to both problems I'd suggest that a restarting node receiving a Path message with a recovery label should always forward it immediately as well as it can, and include both a recovery label and (for back compatibility) a suggested label. Similarly, it should forward RecoveryPath messages immediately as well as it can. I'd be happy to discuss any of this further.This sounds very dangerous. "As well as it can" may include path computation which may pick a path other than the one previously in use. Hence the new Path message will be sent to a new neighbor. This disaster is no better than the problem we are trying to solve. Cheers, Adrian |