[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Segment protection failure when recovery LSPs overlap



 Nic,

> -----Original Message-----
> From: Nic Neate [mailto:Nic.Neate@dataconnection.com] 
> Sent: Thursday, February 26, 2009 4:34 PM
> To: Fatai Zhang; PAPADIMITRIOU Dimitri; ccamp@ops.ietf.org
> Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - 
> Adrian Farrel Personal
> Subject: RE: Segment protection failure when recovery LSPs overlap
> 
> I think we're converging on this. 

I don't think so. See below:

> I'm happy to work on 
> protocol extensions for choosing the master and ensuring that 
> traffic is switched onto just one recovery LSP.  Please let 
> me know if you have any suggestions for how that should work.

As stated in a previous e-mail changing the below procedure is not the
issue at all.

> The particular text in RFCs 4872 and 4873 that I think 
> requires this is as follows.
> 
> RFC 4872 section 7.2 defines the procedures for initiating 
> switchover in 1:1 protection:
> 
>       1. If an end-node ... detects the failure of the working LSP
>          ... it disconnects the extra-traffic from the protecting 
>          LSP...
> 
>          This node MUST reliably send a Notify message ... to the 
>          other end-node ... indicating the failure of the working 
>          LSP ...
> 
>       2. Upon receipt of the (switchover request) Notify message, the
>          end-node ... MUST disconnect the extra-traffic from the 
>          protecting LSP and begin sending/receiving normal traffic 
>          out/from the protecting LSP...
> 
> Those MUST statements don't leave room for the end-node to 
> decide, based on configuration, whether to act as master or slave.

This is not what the text says. The mis-interpretation comes initially
from the fact Notify themselves can be configured so as trigger the
nodes from which recovery initiation is expected.
In the below case typically B,C pair or F,G pair or any composition that
does not result into contradicting decisions and actions. So if decision
is taken to Notify only upstream there is no issue.

In brief, the notion of master/slave can be configured (via head-end)
and triggered via Notify - what is not present today is an election
based on working/protecting segment setup. More specifically there is no
BCP today for selecting Notify addresses for corner cases like you
depicted. The other possibility once protecting segment are setup is an
additional exchange to be executed in order to determine under which
condition the recovery procedure is triggered - and these may be
negotiated instead of being pre-configured.

Thanks,
-dimitri.
> The same procedures are also used for segment protection, as 
> specified in RFC 4873 section 2.1:
> 
>    The switch-over processing for segment 1+1 Bidirectional protection
>    and 1:1 Protection With Extra-Traffic follows the same 
> procedures as
>    end-to-end protection forms; see Sections 6.2 and 7.2 of [RFC4872]
>    for details.
> 
> Nic
> 
> ________________________________
> 
> From: Fatai Zhang [mailto:zhangfatai@huawei.com] 
> Sent: 26 February 2009 02:13
> To: ALU - Dimitri Papadimitriou; Nic Neate; ccamp@ops.ietf.org
> Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - 
> Adrian Farrel Personal
> Subject: Re: Segment protection failure when recovery LSPs overlap
> 
> 
> Hi all,
>  
> I think Nic caught the important issues for implementation.
>  
> There should be a mechanism for node C/D/E/F to know that 
> there are two recovery LSPs for the faults (link faults: 
> C-D,D-E,E-F and also node faults: D and E). And when any 
> fault occurs, I think  only one recovery LSP should take over 
> the traffic.
>  
> RFC 4426 just focuses on " Functional Specification" , and 
> there is no standard solution.
>  
> As a solution for RFC 4873, I think we need patch it if there 
> is any issue to be solved.
>  
>  
> 
> Thanks
>  
> Fatai Zhang
>  
> Advanced Technology Department
> Wireline Networking Business Unit
> Huawei Technologies Co., LTD.
> Huawei Base, Bantian, Longgang,
> Shenzhen 518129 P.R.China
> Tel: +86-755-28972912
> Fax: +86-755-28972935
> 
> 	----- Original Message ----- 
> 	From: PAPADIMITRIOU Dimitri 
> <mailto:Dimitri.Papadimitriou@alcatel-lucent.be>  
> 	To: Nic Neate <mailto:Nic.Neate@dataconnection.com>  ; 
> ccamp@ops.ietf.org 
> 	Cc: labn - Lou Berger <mailto:lberger@labn.net>  ; 
> IBryskin@advaoptical.com ; Aria - Adrian Farrel Personal 
> <mailto:adrian@olddog.co.uk>  
> 	Sent: Wednesday, February 25, 2009 11:40 PM
> 	Subject: RE: Segment protection failure when recovery 
> LSPs overlap
> 
> 	Hi Nic: 
> 	
> 	> -----Original Message-----
> 	> From: Nic Neate [mailto:Nic.Neate@dataconnection.com] 
> 	> Sent: Wednesday, February 25, 2009 3:58 PM
> 	> To: PAPADIMITRIOU Dimitri; ccamp@ops.ietf.org
> 	> Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - 
> 	> Adrian Farrel Personal
> 	> Subject: RE: Segment protection failure when recovery 
> LSPs overlap
> 	> 
> 	> Hi Dimitri,
> 	> 
> 	> I don't think that solves this issue.  Let me restate the 
> 	> problem with more detail on the master/slave behaviour.
> 	> 
> 	> We have the following topology, where the LSP A-B-C-D-E-F-G-H 
> 	> has 1:1 segment protection with extra traffic, and 
> the link D-E fails:
> 	> 
> 	> > >                           K-----------L
> 	> > >                          /             \
> 	> > >                     A===B===C===D x E===F===G===H
> 	> > >                              \             /
> 	> > >                               I-----------J
> 	> 
> 	> RFC 4426 defines a master/slave relationship for the 
> 	> endpoints of the recovery LSPs.
> 	> -  For the recovery LSP B-K-L-F, either B or F will be the 
> 	> master, controlling switchover onto that recovery LSP.
> 	> -  For the recovery LSP C-I-J-G, either C or G will be the 
> 	> master, controlling switchover onto that recovery LSP.
> 	> 
> 	> However, there is no mechanism defined for electing the 
> 	> masters.  Rather, RFC 4872 defines the switchover procedures 
> 	> for 1:1 protection, and states that the first endpoint of the 
> 	> recovery LSP to detect the failure is the one that initiates 
> 	> switchover (http://tools.ietf.org/html/rfc4872#section-7.2).
> 	
> 	Where is it stated in section 7.2 that "the first 
> endpoint of the
> 	recovery LSP to detect the failure is the one that 
> initiates switchover"
> 	can you point me the sentence ?
> 	
> 	> In this example, C and F are closest to the failure, and have 
> 	> their NOTIFY_REQUEST objects at the top of the stack at D and 
> 	> E respectively.  It is therefore likely that C and F will 
> 	> detect the failure before B and G, and so C and F 
> will be the masters.
> 	> 
> 	> That presents the problem that C and F will both attempt to 
> 	> initiate a switchover using their respective recovery LSPs, 
> 	> leading to the data loss described in my original mail.
> 	> 
> 	> If there was a way to force B and C to be masters then your 
> 	> suggestion that B should avoid triggering protection 
> 	> switching before C may work.  However, that is explicitly not 
> 	> considered in 1:1 protection switching.  See Note 1 in 
> 	> http://tools.ietf.org/html/rfc4872#section-7.2:
> 	> 
> 	>    Note 1: a 2-phase protection-switching signaling 
> is used in the
> 	>    present context; a 3-phase signaling (see [RFC4426]) that 
> 	> would imply
> 	>    a notification message, a switchover request, and 
> a switchover
> 	>    response messages is not considered here.
> 	
> 	Per 4426: "The determination of the master and the 
> slave may be based on
> 	configured information or protocol specific 
> requirements." ... so
> 	basically you may extend the protocol messaging 
> detailed in 4872 to
> 	trigger this election but you can also perform it via 
> other means. The
> 	fundamental issue is that it does not modify the 
> protocol procedures
> 	specified in 4872 or 4873 i.e. dynamic election would 
> just be an add-on.
> 	
> 	The same applies to the WTR where we stated specified 
> by configuration
> 	and then Attila came with a dynamic mechanism to set it up, etc.
> 	
> 	Thanks,
> 	-dimitri.
> 	> Nic
> 	> 
> 	> 
> 	> 
> 	> -----Original Message-----
> 	> From: ALU - Dimitri Papadimitriou 
> 	> Sent: 24 February 2009 09:27
> 	> To: Nic Neate; ccamp@ops.ietf.org
> 	> Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - 
> 	> Adrian Farrel Personal
> 	> Subject: RE: Segment protection failure when recovery 
> LSPs overlap
> 	> 
> 	> Nic,
> 	> 
> 	> I will restate, in all protection scheme there is a master 
> 	> slave mechanism. Now concerning the SRRO: C (and B) and F 
> 	> (and G) are generators in the upstream and downstream 
> 	> direction. So the SRRO are known to B and it is what we are 
> 	> interested in that B does not trigger recovery before C and 
> 	> the same for F and G i.e that G does not trigger 
> recovery before F.
> 	> 
> 	> Thanks,
> 	> -dimitri.
> 	> 
> 	> > -----Original Message-----
> 	> > From: Nic Neate [mailto:Nic.Neate@dataconnection.com]
> 	> > Sent: Monday, February 23, 2009 4:13 PM
> 	> > To: PAPADIMITRIOU Dimitri; ccamp@ops.ietf.org
> 	> > Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - 
> 	> Adrian Farrel 
> 	> > Personal
> 	> > Subject: RE: Segment protection failure when 
> recovery LSPs overlap
> 	> > 
> 	> > Hi Dimitri,
> 	> > 
> 	> > We had wondered about the SRRO as a possible solution to 
> 	> this problem 
> 	> > as well.  However, there are a couple of issues as 
> the protocol 
> 	> > currently stands.
> 	> > 
> 	> > -  SRROs can only be present in Path messages between the 
> 	> merge node 
> 	> > and the egress, and in Resv messages between the branch 
> 	> node and the 
> 	> > ingress.  See
> 	> > http://tools.ietf.org/html/rfc4873#section-2 and 
> 	> > http://tools.ietf.org/html/rfc4873#section-5.2.  Therefore, 
> 	> C does not 
> 	> > have the SRRO for recovery LSP B-K-L-F, and F does not have 
> 	> the SRRO 
> 	> > for recovery LSP C-I-J-G.
> 	> > 
> 	> > -  The inclusion of the SRRO is optional, 
> controlled via the 
> 	> > segment-recording-desired flag in the 
> SESSION_ATTRIBUTE object 
> 	> > (http://tools.ietf.org/html/rfc4873#section-5.2).  
> If the SRRO is 
> 	> > required in order to avoid data loss then it needs 
> to be mandatory.
> 	> > 
> 	> > So I think we need a protocol extension in order to 
> provide a 
> 	> > signaling-based solution.
> 	> > 
> 	> > Nic
> 	> > 
> 	> > 
> 	> > -----Original Message-----
> 	> > From: ALU - Dimitri Papadimitriou
> 	> > Sent: 21 February 2009 23:51
> 	> > To: Nic Neate; ccamp@ops.ietf.org
> 	> > Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria - 
> 	> Adrian Farrel 
> 	> > Personal
> 	> > Subject: RE: Segment protection failure when 
> recovery LSPs overlap
> 	> > 
> 	> > Nic,
> 	> > 
> 	> > RFC4873 by means of SRRO allows nodes to determine 
> existence of 
> 	> > upstream/downstream recovery segments as carried in 
> 	> Path/Resv message.
> 	> > Combined with RFC4426 and RFC4428 that refers to 
> master/slave it 
> 	> > results that either C (or F) trigger a recovery 
> action by means of 
> 	> > disjoint recovery segments.
> 	> > 
> 	> > Thanks,
> 	> > -d.
> 	> > 
> 	> > > -----Original Message-----
> 	> > > From: Nic Neate [mailto:Nic.Neate@dataconnection.com]
> 	> > > Sent: Friday, February 20, 2009 4:05 PM
> 	> > > To: ccamp@ops.ietf.org
> 	> > > Cc: labn - Lou Berger; IBryskin@advaoptical.com; 
> PAPADIMITRIOU 
> 	> > > Dimitri; Aria - Adrian Farrel Personal
> 	> > > Subject: Segment protection failure when recovery 
> LSPs overlap
> 	> > > 
> 	> > > Hi CCAMP,
> 	> > > 
> 	> > > I'd like to raise one more issue with RFC4873 segment
> 	> > recovery, which
> 	> > > I believe will lead to data loss when overlapping segment 
> 	> recovery 
> 	> > > LSPs are used.
> 	> > > 
> 	> > > RFC4873 allows topologies like this one:
> 	> > > 
> 	> > >                           K-----------L
> 	> > >                          /             \
> 	> > >                     A===B===C===D===E===F===G===H
> 	> > >                              \             /
> 	> > >                               I-----------J
> 	> > > 
> 	> > > A working LSP A-B-C-D-E-F-G-H is protected by two
> 	> > overlapping segment
> 	> > > recovery LSPs: B-K-L-F and C-I-J-G.  The recovery 
> scheme is 1:1 
> 	> > > protection with extra traffic.
> 	> > > 
> 	> > > Suppose the link D-E fails:
> 	> > > 
> 	> > >                           K-----------L
> 	> > >                          /             \
> 	> > >                     A===B===C===D x E===F===G===H
> 	> > >                              \             /
> 	> > >                               I-----------J
> 	> > > 
> 	> > > My understanding is that the failure will be 
> handled as follows.
> 	> > > 
> 	> > > -  D detects the link failure, and sends Notify to C 
> 	> (first Notify 
> 	> > > object
> 	> > >    in the received Path).  C and G exchanged Notify
> 	> > messages to remove
> 	> > >    extra traffic from the C-I-J-G repair, and then send 
> 	> and receive 
> 	> > >    traffic from the working LSP on C-I and G-J.
> 	> > > 
> 	> > > -  Meanwhile, E also detects the failure, and sends Notify
> 	> > to F (first
> 	> > >    Notify object in the received Resv).  F likewise 
> 	> exchanges Notify
> 	> > >    messages with B to remove extra traffic from the 
> 	> B-K-L-F repair, 
> 	> > > and
> 	> > >    and then send and receive working LSP traffic 
> on B-K and F-L.
> 	> > > 
> 	> > > That results in the following data flow:
> 	> > > 
> 	> > >                           K----->-----L
> 	> > >                          /             \
> 	> > >                     A->-B <-C   D   E   F-> G<--H
> 	> > >                              \             /
> 	> > >                               I-----<-----J
> 	> > > 
> 	> > > Forward traffic reaches G on the link F-G.  However, G has
> 	> > switched to
> 	> > > send and receive on G-J, and so drops traffic 
> received from F.
> 	> > > 
> 	> > > Reverse traffic reaches B on C-B.  However, B has switched
> 	> > to send and
> 	> > > receive on B-K, and so drops traffic received from C.
> 	> > > 
> 	> > > Thus traffic is lost in both directions.
> 	> > > 
> 	> > > Can anyone point out an error in this analysis?  Is this 
> 	> a topology 
> 	> > > that there is interest in supporting?
> 	> > > 
> 	> > > Thanks,
> 	> > > 
> 	> > > Nic
> 	> > > 
> 	> > 
> 	> 
> 	
> 
>