[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: RSVP Graceful Restart
Hi Alia,
The problem I described was actually at a merge point.
The sequence of events is this.
1. MP set up as usual.
2. Upstream failure on protected LSP triggers a reroute at the upstream
PLR. The MP stops receiving Path Refreshes on the protected LSP. It
continues to receive them on the merging backup over which data traffic is
also now arriving.
3. MP has a control plane failure, but data plane survives.
4. MP control plane restarts and GR mechanisms at that node have to
construct a Path Refresh to send downstream on the protected LSP when the
only Path Refreshes being received are on the merging backup. There are
various issues here as described in my last email.
I don't think this is an insignificant timing window. The lack of refreshes
at the MP should be resolved by either local revertive behavior on the
protected LSP or a make-before-break of the protected LSP, but that may take
some time.
Because no Path Refreshes are arriving on the protected LSP in this period,
it is not an option to just use regular restart mechanisms to recover the
LSP and then reinitiate the FRR protection.
Let me know if anything's not clear,
Nic
-----Original Message-----
From: Alia Atlas [mailto:aatlas@avici.com]
Sent: Tuesday, October 28, 2003 3:26 PM
To: Juniper - Kireeti Kompella
Cc: Nic Neate; 'mpls@uu.net'; 'ccamp@ops.ietf.org'
Subject: RE: RSVP Graceful Restart
Hi Nic & Kireeti,
Why is it necessary to recover a protected LSP between the active PLR and
the merge node? That protected LSP will not be used; if one reverts back
to it, then it can be actively recreated, which has to happen today anyhow.
The merge node can ensure that the primary is recovered for itself and
downstream. The PLR can ensure it recovers its backup and the primary
itself.
What I am missing in the problem statement?
Thanks,
Alia
At 09:47 AM 10/28/2003, Kireeti Kompella wrote:
>Hi Nic,
>
>On Mon, 27 Oct 2003, Nic Neate wrote:
>
> > The issue with that is not with overlap in the problems they are
solving,
> > but in recovering FRR backup and protected LSPs (which may not be being
> > refreshed from upstream) after a restart. Is this a problem that
interests
> > you?
>
>Thanks for clarifying. I agree that networks will have both GR and
>FRR. In principle, I agree that this (recovering FRR sessions during
>GR) is a problem that needs to be solved. However, the premise here
>is that during GR (hopefully a small time window, say a couple of
>minutes), you have a second failure (of a link or a node).
>
>Yes, it's possible to have multiple failures, but the combination of a
>control plane failure recoverable by GR and a link/node failure that
>requires FRR seems remote.
>
>A practical solution (that may not satisfy all) is for GR to recover
>'regular' LSPs sessions, and to re-initiate FRR sessions; and to abort
>GR if a second failure occurs that would have necessitated FRR.
>
>On the other hand, let me not discourage you. Let's take a look at
>solutions, and see how simple/complex they are before ruling on the
>protocol work.
>
>Kireeti.
>-------