[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: draft-aruns-ccamp-rsvp-restart-ext-00
We'll have the part for simultaneous adjacent restarts in ~2 weeks.
Regards,
Reshad.
Lou Berger wrote:
>
> Nic,
> In one-on-one discussions at the IETF the authors agreed to do
> just these two things! I know we're hoping to get the first part done late
> this week/early next week. I can't speak for the other authors (of the
> other half of the to-be-merged draft) on the second part.
>
> Lou
>
> At 07:41 AM 3/9/2004 -0500, Nic Neate wrote:
>
> >Hi Adrian (and draft-aruns authors),
> >
> >Responses below. In summary, I agree
> > - with the suggestion of being able to request RecoveryPath messages
> > - that it would be very helpful if the procedures for recovering from
> >simultaneous adjacent restarts could be clarified.
> >
> >Thanks,
> >
> >Nic
> >
> > > -----Original Message-----
> > > From: Adrian Farrel
> > [<mailto:adrian@olddog.co.uk>mailto:adrian@olddog.co.uk]
> > > Sent: Saturday, March 06, 2004 12:47 PM
> > > To: Nic Neate; aruns@movaz.com; Movaz Networks - Louis Berger;
> > > dimitri.papadimitriou@alcatel.be
> > > Cc: ccamp@ops.ietf.org
> > > Subject: Re: draft-aruns-ccamp-rsvp-restart-ext-00
> > >
> > >
> > > Hi Nic,
> > >
> > > > I've just read your draft-aruns-ccamp-rsvp-restart-ext-00
> > > and it looks good.
> > > > In particular, we've been looking at using Restart for Fast
> > > Reroute LSPs for
> > > > some time and this draft provides everything that is needed
> > > (like recovering
> > > > the FAST_REROUTE, DETOUR, SENDER_TEMPLATE and ERO
> > > > objects from the downstream node when they are not
> > > available from upstream).
> > >
> > > Good. This concern was also raised in Seoul, and I am pleased
> > > to hear that the draft
> > > addresses these requirements.
> > >
> > > > However, I have a couple of concerns (not related to Fast Reroute).
> > > >
> > > > - Your draft doesn't tackle, and won't work for,
> > > simultaneous restart of
> > > > adjacent nodes. This is a problem that is tackled by
> > > > draft-rahman-ccamp-rsvp-restart-extensions, so merging the
> > > two drafts in
> > > > some way may be the best way to resolve that. I realize
> > > that the Aruns
> > > > draft aims to make Restart possible for nodes which cannot
> > > retrieve state
> > > > from the data plane, and in that case recovering from
> > > simultaneous restart
> > > > of adjacent nodes isn't easy. I think including some
> > > further extensions for
> > > > nodes which can retrieve some state from the data plane would be
> > > > appropriate.
> > >
> > > Retrieving state from the data plane only answers half of the
> > > problem. However, it is
> > > certainly important to audit the recovered control plane
> > > information against the known
> > > data plane state.
> > >
> >
> >Indeed. My point was that if you can't retrieve even the outgoing signaling
> >interface from your data plane following a "nodal fault", you haven't got
> >much hope of reconstructing protocol state in between two nodes which
> >restarted at the same time (without some serious protocol enhancement
> >anyway). Hence the suggestion of additional extensions to recover from
> >adjacent restarts for nodes which can retrieve the outgoing signaling
> >interface.
> >
> > > With regard to adjacent node failures and restarts, I believe
> > > there are actually
> > > sufficient capabilities here. Perhaps the authors would like
> > > to include text to clarify
> > > the procedures.
> > >
> >
> >If this is the case, then no problem. I agree that some text clarifying
> >that in the draft would be very helpful.
> >
> > > > - The back compatibility with RFC 3473 restart looks
> > > risky. Draft Aruns
> > > > mandates that restarted nodes don't send Path Refreshes
> > > until either the
> > > > recovery period expires or a RecoveryPath is received from
> > > downstream. In
> > > > the case that the downstream node only supports RFC 3473
> > > restart (and so
> > > > doesn't send RecoveryPaths), it may well timeout Path state
> > > at the same time
> > > > as or very soon after the recovery period expires. Hence a
> > > dangerous timing
> > > > window is created.
> > >
> > > You have something here.
> > > However, section 9.5.3 of RFC3473 does not say that the
> > > neighbor MUST discard state that
> > > is not restored in the recovery time interval. Presumably it
> > > would simply recommence
> > > waiting for state refresh and so would time out after a 3.5
> > > refresh intervals from the end
> > > of the recovery interval.
> > >
> >
> >That would be sensible behavior, yes. My concern (as I'm sure you realize)
> >is that it won't happen like that in all cases in the real world.
> >
> > > Some compromise may be introduced here by noting that 3473
> > > says that Path state SHOULD be
> > > restored within 1/2 of the recovery time. So we could follow
> > > this logic and use the first
> > > half of the time interval for the RecoveryPath message and
> > > the second half for backwards
> > > compatible recovery.
> > >
> > > On the other hand, I would prefer that this new capability
> > > (support for RecoveryPath
> > > message) was signaled in the Restart_Capabilities object so
> > > that the restarting node can
> > > know whether to expect to receive a RecoveryPath or not.
> > >
> > > > As a potential solution to both problems I'd suggest that a
> > > restarting node
> > > > receiving a Path message with a recovery label should
> > > always forward it
> > > > immediately as well as it can, and include both a recovery
> > > label and (for
> > > > back compatibility) a suggested label. Similarly, it should forward
> > > > RecoveryPath messages immediately as well as it can. I'd
> > > be happy to
> > > > discuss any of this further.
> > >
> > > This sounds very dangerous.
> > > "As well as it can" may include path computation which may
> > > pick a path other than the one
> > > previously in use. Hence the new Path message will be sent to
> > > a new neighbor. This
> > > disaster is no better than the problem we are trying to solve.
> > >
> >
> >Fine. I had in mind that a node should only forward a Path message before
> >receiving a RecoveryPath if it was sure that it could send it (as per
> >RFC3473) to the right place and without a dangerous ERO. In any case, I
> >prefer the idea of being able to request RecoveryPath messages and it sounds
> >like that will make recovery possible in more situations.
> >
> > > Cheers,
> > > Adrian
> > >