[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Segment protection failure when recovery LSPs overlap



Hi all,
 
I think Nic caught the important issues for implementation.
 
There should be a mechanism for node C/D/E/F to know that there are two recovery LSPs for the faults (link faults: C-D,D-E,E-F and also node faults: D and E). And when any fault occurs, I think  only one recovery LSP should take over the traffic.
 
RFC 4426 just focuses on " Functional Specification" , and there is no standard solution.
 
As a solution for RFC 4873, I think we need patch it if there is any issue to be solved.
 
 

Thanks
 
Fatai Zhang
 
Advanced Technology Department
Wireline Networking Business Unit
Huawei Technologies Co., LTD.
Huawei Base, Bantian, Longgang,
Shenzhen 518129 P.R.China
Tel: +86-755-28972912
Fax: +86-755-28972935
----- Original Message -----
Sent: Wednesday, February 25, 2009 11:40 PM
Subject: RE: Segment protection failure when recovery LSPs overlap

Hi Nic:

> -----Original Message-----
> From: Nic Neate [mailto:Nic.Neate@dataconnection.com]
> Sent: Wednesday, February 25, 2009 3:58 PM
> To: PAPADIMITRIOU Dimitri; ccamp@ops.ietf.org
> Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria -
> Adrian Farrel Personal
> Subject: RE: Segment protection failure when recovery LSPs overlap
>
> Hi Dimitri,
>
> I don't think that solves this issue.  Let me restate the
> problem with more detail on the master/slave behaviour.
>
> We have the following topology, where the LSP A-B-C-D-E-F-G-H
> has 1:1 segment protection with extra traffic, and the link D-E fails:
>
> > >                           K-----------L
> > >                          /             \
> > >                     A===B===C===D x E===F===G===H
> > >                              \             /
> > >                               I-----------J
>
> RFC 4426 defines a master/slave relationship for the
> endpoints of the recovery LSPs.
> -  For the recovery LSP B-K-L-F, either B or F will be the
> master, controlling switchover onto that recovery LSP.
> -  For the recovery LSP C-I-J-G, either C or G will be the
> master, controlling switchover onto that recovery LSP.
>
> However, there is no mechanism defined for electing the
> masters.  Rather, RFC 4872 defines the switchover procedures
> for 1:1 protection, and states that the first endpoint of the
> recovery LSP to detect the failure is the one that initiates
> switchover (http://tools.ietf.org/html/rfc4872#section-7.2).

Where is it stated in section 7.2 that "the first endpoint of the
recovery LSP to detect the failure is the one that initiates switchover"
can you point me the sentence ?

> In this example, C and F are closest to the failure, and have
> their NOTIFY_REQUEST objects at the top of the stack at D and
> E respectively.  It is therefore likely that C and F will
> detect the failure before B and G, and so C and F will be the masters.
>
> That presents the problem that C and F will both attempt to
> initiate a switchover using their respective recovery LSPs,
> leading to the data loss described in my original mail.
>
> If there was a way to force B and C to be masters then your
> suggestion that B should avoid triggering protection
> switching before C may work.  However, that is explicitly not
> considered in 1:1 protection switching.  See Note 1 in
> http://tools.ietf.org/html/rfc4872#section-7.2:
>
>    Note 1: a 2-phase protection-switching signaling is used in the
>    present context; a 3-phase signaling (see [RFC4426]) that
> would imply
>    a notification message, a switchover request, and a switchover
>    response messages is not considered here.

Per 4426: "The determination of the master and the slave may be based on
configured information or protocol specific requirements." ... so
basically you may extend the protocol messaging detailed in 4872 to
trigger this election but you can also perform it via other means. The
fundamental issue is that it does not modify the protocol procedures
specified in 4872 or 4873 i.e. dynamic election would just be an add-on.

The same applies to the WTR where we stated specified by configuration
and then Attila came with a dynamic mechanism to set it up, etc.

Thanks,
-dimitri.
> Nic
>
>
>
> -----Original Message-----
> From: ALU - Dimitri Papadimitriou
> Sent: 24 February 2009 09:27
> To: Nic Neate; ccamp@ops.ietf.org
> Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria -
> Adrian Farrel Personal
> Subject: RE: Segment protection failure when recovery LSPs overlap
>
> Nic,
>
> I will restate, in all protection scheme there is a master
> slave mechanism. Now concerning the SRRO: C (and B) and F
> (and G) are generators in the upstream and downstream
> direction. So the SRRO are known to B and it is what we are
> interested in that B does not trigger recovery before C and
> the same for F and G i.e that G does not trigger recovery before F.
>
> Thanks,
> -dimitri.
>
> > -----Original Message-----
> > From: Nic Neate [mailto:Nic.Neate@dataconnection.com]
> > Sent: Monday, February 23, 2009 4:13 PM
> > To: PAPADIMITRIOU Dimitri; ccamp@ops.ietf.org
> > Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria -
> Adrian Farrel
> > Personal
> > Subject: RE: Segment protection failure when recovery LSPs overlap
> >
> > Hi Dimitri,
> >
> > We had wondered about the SRRO as a possible solution to
> this problem
> > as well.  However, there are a couple of issues as the protocol
> > currently stands.
> >
> > -  SRROs can only be present in Path messages between the
> merge node
> > and the egress, and in Resv messages between the branch
> node and the
> > ingress.  See
> > http://tools.ietf.org/html/rfc4873#section-2 and
> > http://tools.ietf.org/html/rfc4873#section-5.2.  Therefore,
> C does not
> > have the SRRO for recovery LSP B-K-L-F, and F does not have
> the SRRO
> > for recovery LSP C-I-J-G.
> >
> > -  The inclusion of the SRRO is optional, controlled via the
> > segment-recording-desired flag in the SESSION_ATTRIBUTE object
> > (http://tools.ietf.org/html/rfc4873#section-5.2).  If the SRRO is
> > required in order to avoid data loss then it needs to be mandatory.
> >
> > So I think we need a protocol extension in order to provide a
> > signaling-based solution.
> >
> > Nic
> >
> >
> > -----Original Message-----
> > From: ALU - Dimitri Papadimitriou
> > Sent: 21 February 2009 23:51
> > To: Nic Neate; ccamp@ops.ietf.org
> > Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria -
> Adrian Farrel
> > Personal
> > Subject: RE: Segment protection failure when recovery LSPs overlap
> >
> > Nic,
> >
> > RFC4873 by means of SRRO allows nodes to determine existence of
> > upstream/downstream recovery segments as carried in
> Path/Resv message.
> > Combined with RFC4426 and RFC4428 that refers to master/slave it
> > results that either C (or F) trigger a recovery action by means of
> > disjoint recovery segments.
> >
> > Thanks,
> > -d.
> >
> > > -----Original Message-----
> > > From: Nic Neate [mailto:Nic.Neate@dataconnection.com]
> > > Sent: Friday, February 20, 2009 4:05 PM
> > > To: ccamp@ops.ietf.org
> > > Cc: labn - Lou Berger; IBryskin@advaoptical.com; PAPADIMITRIOU
> > > Dimitri; Aria - Adrian Farrel Personal
> > > Subject: Segment protection failure when recovery LSPs overlap
> > >
> > > Hi CCAMP,
> > >
> > > I'd like to raise one more issue with RFC4873 segment
> > recovery, which
> > > I believe will lead to data loss when overlapping segment
> recovery
> > > LSPs are used.
> > >
> > > RFC4873 allows topologies like this one:
> > >
> > >                           K-----------L
> > >                          /             \
> > >                     A===B===C===D===E===F===G===H
> > >                              \             /
> > >                               I-----------J
> > >
> > > A working LSP A-B-C-D-E-F-G-H is protected by two
> > overlapping segment
> > > recovery LSPs: B-K-L-F and C-I-J-G.  The recovery scheme is 1:1
> > > protection with extra traffic.
> > >
> > > Suppose the link D-E fails:
> > >
> > >                           K-----------L
> > >                          /             \
> > >                     A===B===C===D x E===F===G===H
> > >                              \             /
> > >                               I-----------J
> > >
> > > My understanding is that the failure will be handled as follows.
> > >
> > > -  D detects the link failure, and sends Notify to C
> (first Notify
> > > object
> > >    in the received Path).  C and G exchanged Notify
> > messages to remove
> > >    extra traffic from the C-I-J-G repair, and then send
> and receive
> > >    traffic from the working LSP on C-I and G-J.
> > >
> > > -  Meanwhile, E also detects the failure, and sends Notify
> > to F (first
> > >    Notify object in the received Resv).  F likewise
> exchanges Notify
> > >    messages with B to remove extra traffic from the
> B-K-L-F repair,
> > > and
> > >    and then send and receive working LSP traffic on B-K and F-L.
> > >
> > > That results in the following data flow:
> > >
> > >                           K----->-----L
> > >                          /             \
> > >                     A->-B <-C   D   E   F-> G<--H
> > >                              \             /
> > >                               I-----<-----J
> > >
> > > Forward traffic reaches G on the link F-G.  However, G has
> > switched to
> > > send and receive on G-J, and so drops traffic received from F.
> > >
> > > Reverse traffic reaches B on C-B.  However, B has switched
> > to send and
> > > receive on B-K, and so drops traffic received from C.
> > >
> > > Thus traffic is lost in both directions.
> > >
> > > Can anyone point out an error in this analysis?  Is this
> a topology
> > > that there is interest in supporting?
> > >
> > > Thanks,
> > >
> > > Nic
> > >
> >
>