[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Segment protection failure when recovery LSPs overlap



Hi Dimitri,

I don't think that solves this issue.  Let me restate the problem with more detail on the master/slave behaviour.

We have the following topology, where the LSP A-B-C-D-E-F-G-H has 1:1 segment protection with extra traffic, and the link D-E fails:

> >                           K-----------L
> >                          /             \
> >                     A===B===C===D x E===F===G===H
> >                              \             /
> >                               I-----------J

RFC 4426 defines a master/slave relationship for the endpoints of the recovery LSPs.
-  For the recovery LSP B-K-L-F, either B or F will be the master, controlling switchover onto that recovery LSP.
-  For the recovery LSP C-I-J-G, either C or G will be the master, controlling switchover onto that recovery LSP.

However, there is no mechanism defined for electing the masters.  Rather, RFC 4872 defines the switchover procedures for 1:1 protection, and states that the first endpoint of the recovery LSP to detect the failure is the one that initiates switchover (http://tools.ietf.org/html/rfc4872#section-7.2).

In this example, C and F are closest to the failure, and have their NOTIFY_REQUEST objects at the top of the stack at D and E respectively.  It is therefore likely that C and F will detect the failure before B and G, and so C and F will be the masters.

That presents the problem that C and F will both attempt to initiate a switchover using their respective recovery LSPs, leading to the data loss described in my original mail.

If there was a way to force B and C to be masters then your suggestion that B should avoid triggering protection switching before C may work.  However, that is explicitly not considered in 1:1 protection switching.  See Note 1 in http://tools.ietf.org/html/rfc4872#section-7.2:

   Note 1: a 2-phase protection-switching signaling is used in the
   present context; a 3-phase signaling (see [RFC4426]) that would imply
   a notification message, a switchover request, and a switchover
   response messages is not considered here.

Nic



-----Original Message-----
From: ALU - Dimitri Papadimitriou 
Sent: 24 February 2009 09:27
To: Nic Neate; ccamp@ops.ietf.org
Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - Adrian Farrel Personal
Subject: RE: Segment protection failure when recovery LSPs overlap

Nic,

I will restate, in all protection scheme there is a master slave mechanism. Now concerning the SRRO: C (and B) and F (and G) are generators in the upstream and downstream direction. So the SRRO are known to B and it is what we are interested in that B does not trigger recovery before C and the same for F and G i.e that G does not trigger recovery before F.

Thanks,
-dimitri.

> -----Original Message-----
> From: Nic Neate [mailto:Nic.Neate@dataconnection.com]
> Sent: Monday, February 23, 2009 4:13 PM
> To: PAPADIMITRIOU Dimitri; ccamp@ops.ietf.org
> Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - Adrian Farrel 
> Personal
> Subject: RE: Segment protection failure when recovery LSPs overlap
> 
> Hi Dimitri,
> 
> We had wondered about the SRRO as a possible solution to this problem 
> as well.  However, there are a couple of issues as the protocol 
> currently stands.
> 
> -  SRROs can only be present in Path messages between the merge node 
> and the egress, and in Resv messages between the branch node and the 
> ingress.  See
> http://tools.ietf.org/html/rfc4873#section-2 and 
> http://tools.ietf.org/html/rfc4873#section-5.2.  Therefore, C does not 
> have the SRRO for recovery LSP B-K-L-F, and F does not have the SRRO 
> for recovery LSP C-I-J-G.
> 
> -  The inclusion of the SRRO is optional, controlled via the 
> segment-recording-desired flag in the SESSION_ATTRIBUTE object 
> (http://tools.ietf.org/html/rfc4873#section-5.2).  If the SRRO is 
> required in order to avoid data loss then it needs to be mandatory.
> 
> So I think we need a protocol extension in order to provide a 
> signaling-based solution.
> 
> Nic
> 
> 
> -----Original Message-----
> From: ALU - Dimitri Papadimitriou
> Sent: 21 February 2009 23:51
> To: Nic Neate; ccamp@ops.ietf.org
> Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - Adrian Farrel 
> Personal
> Subject: RE: Segment protection failure when recovery LSPs overlap
> 
> Nic,
> 
> RFC4873 by means of SRRO allows nodes to determine existence of 
> upstream/downstream recovery segments as carried in Path/Resv message.
> Combined with RFC4426 and RFC4428 that refers to master/slave it 
> results that either C (or F) trigger a recovery action by means of 
> disjoint recovery segments.
> 
> Thanks,
> -d.
> 
> > -----Original Message-----
> > From: Nic Neate [mailto:Nic.Neate@dataconnection.com]
> > Sent: Friday, February 20, 2009 4:05 PM
> > To: ccamp@ops.ietf.org
> > Cc: labn - Lou Berger; IBryskin@advaoptical.com; PAPADIMITRIOU 
> > Dimitri; Aria - Adrian Farrel Personal
> > Subject: Segment protection failure when recovery LSPs overlap
> > 
> > Hi CCAMP,
> > 
> > I'd like to raise one more issue with RFC4873 segment
> recovery, which
> > I believe will lead to data loss when overlapping segment recovery 
> > LSPs are used.
> > 
> > RFC4873 allows topologies like this one:
> > 
> >                           K-----------L
> >                          /             \
> >                     A===B===C===D===E===F===G===H
> >                              \             /
> >                               I-----------J
> > 
> > A working LSP A-B-C-D-E-F-G-H is protected by two
> overlapping segment
> > recovery LSPs: B-K-L-F and C-I-J-G.  The recovery scheme is 1:1 
> > protection with extra traffic.
> > 
> > Suppose the link D-E fails:
> > 
> >                           K-----------L
> >                          /             \
> >                     A===B===C===D x E===F===G===H
> >                              \             /
> >                               I-----------J
> > 
> > My understanding is that the failure will be handled as follows.
> > 
> > -  D detects the link failure, and sends Notify to C (first Notify 
> > object
> >    in the received Path).  C and G exchanged Notify
> messages to remove
> >    extra traffic from the C-I-J-G repair, and then send and receive 
> >    traffic from the working LSP on C-I and G-J.
> > 
> > -  Meanwhile, E also detects the failure, and sends Notify
> to F (first
> >    Notify object in the received Resv).  F likewise exchanges Notify
> >    messages with B to remove extra traffic from the B-K-L-F repair, 
> > and
> >    and then send and receive working LSP traffic on B-K and F-L.
> > 
> > That results in the following data flow:
> > 
> >                           K----->-----L
> >                          /             \
> >                     A->-B <-C   D   E   F-> G<--H
> >                              \             /
> >                               I-----<-----J
> > 
> > Forward traffic reaches G on the link F-G.  However, G has
> switched to
> > send and receive on G-J, and so drops traffic received from F.
> > 
> > Reverse traffic reaches B on C-B.  However, B has switched
> to send and
> > receive on B-K, and so drops traffic received from C.
> > 
> > Thus traffic is lost in both directions.
> > 
> > Can anyone point out an error in this analysis?  Is this a topology 
> > that there is interest in supporting?
> > 
> > Thanks,
> > 
> > Nic
> > 
>