[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Segment protection failure when recovery LSPs overlap



I think we're converging on this.  I'm happy to work on protocol extensions for choosing the master and ensuring that traffic is switched onto just one recovery LSP.  Please let me know if you have any suggestions for how that should work.

The particular text in RFCs 4872 and 4873 that I think requires this is as follows.

RFC 4872 section 7.2 defines the procedures for initiating switchover in 1:1 protection:

      1. If an end-node ... detects the failure of the working LSP
         ... it disconnects the extra-traffic from the protecting 
         LSP...

         This node MUST reliably send a Notify message ... to the 
         other end-node ... indicating the failure of the working 
         LSP ...

      2. Upon receipt of the (switchover request) Notify message, the
         end-node ... MUST disconnect the extra-traffic from the 
         protecting LSP and begin sending/receiving normal traffic 
         out/from the protecting LSP...

Those MUST statements don't leave room for the end-node to decide, based on configuration, whether to act as master or slave.

The same procedures are also used for segment protection, as specified in RFC 4873 section 2.1:

   The switch-over processing for segment 1+1 Bidirectional protection
   and 1:1 Protection With Extra-Traffic follows the same procedures as
   end-to-end protection forms; see Sections 6.2 and 7.2 of [RFC4872]
   for details.

Nic

________________________________

From: Fatai Zhang [mailto:zhangfatai@huawei.com] 
Sent: 26 February 2009 02:13
To: ALU - Dimitri Papadimitriou; Nic Neate; ccamp@ops.ietf.org
Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - Adrian Farrel Personal
Subject: Re: Segment protection failure when recovery LSPs overlap


Hi all,
 
I think Nic caught the important issues for implementation.
 
There should be a mechanism for node C/D/E/F to know that there are two recovery LSPs for the faults (link faults: C-D,D-E,E-F and also node faults: D and E). And when any fault occurs, I think  only one recovery LSP should take over the traffic.
 
RFC 4426 just focuses on " Functional Specification" , and there is no standard solution.
 
As a solution for RFC 4873, I think we need patch it if there is any issue to be solved.
 
 

Thanks
 
Fatai Zhang
 
Advanced Technology Department
Wireline Networking Business Unit
Huawei Technologies Co., LTD.
Huawei Base, Bantian, Longgang,
Shenzhen 518129 P.R.China
Tel: +86-755-28972912
Fax: +86-755-28972935

	----- Original Message ----- 
	From: PAPADIMITRIOU Dimitri <mailto:Dimitri.Papadimitriou@alcatel-lucent.be>  
	To: Nic Neate <mailto:Nic.Neate@dataconnection.com>  ; ccamp@ops.ietf.org 
	Cc: labn - Lou Berger <mailto:lberger@labn.net>  ; IBryskin@advaoptical.com ; Aria - Adrian Farrel Personal <mailto:adrian@olddog.co.uk>  
	Sent: Wednesday, February 25, 2009 11:40 PM
	Subject: RE: Segment protection failure when recovery LSPs overlap

	Hi Nic: 
	
	> -----Original Message-----
	> From: Nic Neate [mailto:Nic.Neate@dataconnection.com] 
	> Sent: Wednesday, February 25, 2009 3:58 PM
	> To: PAPADIMITRIOU Dimitri; ccamp@ops.ietf.org
	> Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - 
	> Adrian Farrel Personal
	> Subject: RE: Segment protection failure when recovery LSPs overlap
	> 
	> Hi Dimitri,
	> 
	> I don't think that solves this issue.  Let me restate the 
	> problem with more detail on the master/slave behaviour.
	> 
	> We have the following topology, where the LSP A-B-C-D-E-F-G-H 
	> has 1:1 segment protection with extra traffic, and the link D-E fails:
	> 
	> > >                           K-----------L
	> > >                          /             \
	> > >                     A===B===C===D x E===F===G===H
	> > >                              \             /
	> > >                               I-----------J
	> 
	> RFC 4426 defines a master/slave relationship for the 
	> endpoints of the recovery LSPs.
	> -  For the recovery LSP B-K-L-F, either B or F will be the 
	> master, controlling switchover onto that recovery LSP.
	> -  For the recovery LSP C-I-J-G, either C or G will be the 
	> master, controlling switchover onto that recovery LSP.
	> 
	> However, there is no mechanism defined for electing the 
	> masters.  Rather, RFC 4872 defines the switchover procedures 
	> for 1:1 protection, and states that the first endpoint of the 
	> recovery LSP to detect the failure is the one that initiates 
	> switchover (http://tools.ietf.org/html/rfc4872#section-7.2).
	
	Where is it stated in section 7.2 that "the first endpoint of the
	recovery LSP to detect the failure is the one that initiates switchover"
	can you point me the sentence ?
	
	> In this example, C and F are closest to the failure, and have 
	> their NOTIFY_REQUEST objects at the top of the stack at D and 
	> E respectively.  It is therefore likely that C and F will 
	> detect the failure before B and G, and so C and F will be the masters.
	> 
	> That presents the problem that C and F will both attempt to 
	> initiate a switchover using their respective recovery LSPs, 
	> leading to the data loss described in my original mail.
	> 
	> If there was a way to force B and C to be masters then your 
	> suggestion that B should avoid triggering protection 
	> switching before C may work.  However, that is explicitly not 
	> considered in 1:1 protection switching.  See Note 1 in 
	> http://tools.ietf.org/html/rfc4872#section-7.2:
	> 
	>    Note 1: a 2-phase protection-switching signaling is used in the
	>    present context; a 3-phase signaling (see [RFC4426]) that 
	> would imply
	>    a notification message, a switchover request, and a switchover
	>    response messages is not considered here.
	
	Per 4426: "The determination of the master and the slave may be based on
	configured information or protocol specific requirements." ... so
	basically you may extend the protocol messaging detailed in 4872 to
	trigger this election but you can also perform it via other means. The
	fundamental issue is that it does not modify the protocol procedures
	specified in 4872 or 4873 i.e. dynamic election would just be an add-on.
	
	The same applies to the WTR where we stated specified by configuration
	and then Attila came with a dynamic mechanism to set it up, etc.
	
	Thanks,
	-dimitri.
	> Nic
	> 
	> 
	> 
	> -----Original Message-----
	> From: ALU - Dimitri Papadimitriou 
	> Sent: 24 February 2009 09:27
	> To: Nic Neate; ccamp@ops.ietf.org
	> Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - 
	> Adrian Farrel Personal
	> Subject: RE: Segment protection failure when recovery LSPs overlap
	> 
	> Nic,
	> 
	> I will restate, in all protection scheme there is a master 
	> slave mechanism. Now concerning the SRRO: C (and B) and F 
	> (and G) are generators in the upstream and downstream 
	> direction. So the SRRO are known to B and it is what we are 
	> interested in that B does not trigger recovery before C and 
	> the same for F and G i.e that G does not trigger recovery before F.
	> 
	> Thanks,
	> -dimitri.
	> 
	> > -----Original Message-----
	> > From: Nic Neate [mailto:Nic.Neate@dataconnection.com]
	> > Sent: Monday, February 23, 2009 4:13 PM
	> > To: PAPADIMITRIOU Dimitri; ccamp@ops.ietf.org
	> > Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - 
	> Adrian Farrel 
	> > Personal
	> > Subject: RE: Segment protection failure when recovery LSPs overlap
	> > 
	> > Hi Dimitri,
	> > 
	> > We had wondered about the SRRO as a possible solution to 
	> this problem 
	> > as well.  However, there are a couple of issues as the protocol 
	> > currently stands.
	> > 
	> > -  SRROs can only be present in Path messages between the 
	> merge node 
	> > and the egress, and in Resv messages between the branch 
	> node and the 
	> > ingress.  See
	> > http://tools.ietf.org/html/rfc4873#section-2 and 
	> > http://tools.ietf.org/html/rfc4873#section-5.2.  Therefore, 
	> C does not 
	> > have the SRRO for recovery LSP B-K-L-F, and F does not have 
	> the SRRO 
	> > for recovery LSP C-I-J-G.
	> > 
	> > -  The inclusion of the SRRO is optional, controlled via the 
	> > segment-recording-desired flag in the SESSION_ATTRIBUTE object 
	> > (http://tools.ietf.org/html/rfc4873#section-5.2).  If the SRRO is 
	> > required in order to avoid data loss then it needs to be mandatory.
	> > 
	> > So I think we need a protocol extension in order to provide a 
	> > signaling-based solution.
	> > 
	> > Nic
	> > 
	> > 
	> > -----Original Message-----
	> > From: ALU - Dimitri Papadimitriou
	> > Sent: 21 February 2009 23:51
	> > To: Nic Neate; ccamp@ops.ietf.org
	> > Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - 
	> Adrian Farrel 
	> > Personal
	> > Subject: RE: Segment protection failure when recovery LSPs overlap
	> > 
	> > Nic,
	> > 
	> > RFC4873 by means of SRRO allows nodes to determine existence of 
	> > upstream/downstream recovery segments as carried in 
	> Path/Resv message.
	> > Combined with RFC4426 and RFC4428 that refers to master/slave it 
	> > results that either C (or F) trigger a recovery action by means of 
	> > disjoint recovery segments.
	> > 
	> > Thanks,
	> > -d.
	> > 
	> > > -----Original Message-----
	> > > From: Nic Neate [mailto:Nic.Neate@dataconnection.com]
	> > > Sent: Friday, February 20, 2009 4:05 PM
	> > > To: ccamp@ops.ietf.org
	> > > Cc: labn - Lou Berger; IBryskin@advaoptical.com; PAPADIMITRIOU 
	> > > Dimitri; Aria - Adrian Farrel Personal
	> > > Subject: Segment protection failure when recovery LSPs overlap
	> > > 
	> > > Hi CCAMP,
	> > > 
	> > > I'd like to raise one more issue with RFC4873 segment
	> > recovery, which
	> > > I believe will lead to data loss when overlapping segment 
	> recovery 
	> > > LSPs are used.
	> > > 
	> > > RFC4873 allows topologies like this one:
	> > > 
	> > >                           K-----------L
	> > >                          /             \
	> > >                     A===B===C===D===E===F===G===H
	> > >                              \             /
	> > >                               I-----------J
	> > > 
	> > > A working LSP A-B-C-D-E-F-G-H is protected by two
	> > overlapping segment
	> > > recovery LSPs: B-K-L-F and C-I-J-G.  The recovery scheme is 1:1 
	> > > protection with extra traffic.
	> > > 
	> > > Suppose the link D-E fails:
	> > > 
	> > >                           K-----------L
	> > >                          /             \
	> > >                     A===B===C===D x E===F===G===H
	> > >                              \             /
	> > >                               I-----------J
	> > > 
	> > > My understanding is that the failure will be handled as follows.
	> > > 
	> > > -  D detects the link failure, and sends Notify to C 
	> (first Notify 
	> > > object
	> > >    in the received Path).  C and G exchanged Notify
	> > messages to remove
	> > >    extra traffic from the C-I-J-G repair, and then send 
	> and receive 
	> > >    traffic from the working LSP on C-I and G-J.
	> > > 
	> > > -  Meanwhile, E also detects the failure, and sends Notify
	> > to F (first
	> > >    Notify object in the received Resv).  F likewise 
	> exchanges Notify
	> > >    messages with B to remove extra traffic from the 
	> B-K-L-F repair, 
	> > > and
	> > >    and then send and receive working LSP traffic on B-K and F-L.
	> > > 
	> > > That results in the following data flow:
	> > > 
	> > >                           K----->-----L
	> > >                          /             \
	> > >                     A->-B <-C   D   E   F-> G<--H
	> > >                              \             /
	> > >                               I-----<-----J
	> > > 
	> > > Forward traffic reaches G on the link F-G.  However, G has
	> > switched to
	> > > send and receive on G-J, and so drops traffic received from F.
	> > > 
	> > > Reverse traffic reaches B on C-B.  However, B has switched
	> > to send and
	> > > receive on B-K, and so drops traffic received from C.
	> > > 
	> > > Thus traffic is lost in both directions.
	> > > 
	> > > Can anyone point out an error in this analysis?  Is this 
	> a topology 
	> > > that there is interest in supporting?
	> > > 
	> > > Thanks,
	> > > 
	> > > Nic
	> > > 
	> > 
	>