Hi all,
I think Nic caught the important issues for
implementation.
There should be a mechanism for node C/D/E/F to know that
there are two recovery LSPs for the faults (link faults: C-D,D-E,E-F and also
node faults: D and E). And when any fault occurs, I think only one
recovery LSP should take over the traffic.
RFC 4426 just focuses on " Functional Specification" , and
there is no standard solution.
As a solution for RFC 4873, I think we need patch it if
there is any issue to be solved.
Thanks Fatai Zhang Advanced
Technology Department Wireline Networking Business Unit Huawei
Technologies Co., LTD. Huawei Base, Bantian, Longgang, Shenzhen 518129
P.R.China Tel: +86-755-28972912 Fax: +86-755-28972935
----- Original Message -----
Sent: Wednesday, February 25, 2009 11:40
PM
Subject: RE: Segment protection failure when
recovery LSPs overlap
Hi Nic:
> -----Original Message----- > From:
Nic Neate [mailto:Nic.Neate@dataconnection.com] > Sent: Wednesday,
February 25, 2009 3:58 PM > To: PAPADIMITRIOU Dimitri; ccamp@ops.ietf.org > Cc: labn - Lou
Berger; IBryskin@advaoptical.com; Aria -
> Adrian Farrel Personal > Subject: RE: Segment protection
failure when recovery LSPs overlap > > Hi Dimitri, >
> I don't think that solves this issue. Let me restate the
> problem with more detail on the master/slave behaviour. >
> We have the following topology, where the LSP A-B-C-D-E-F-G-H
> has 1:1 segment protection with extra traffic, and the link D-E
fails: > > >
>
K-----------L > >
>
/
\ > >
>
A===B===C===D x E===F===G===H > >
>
\
/ > >
>
I-----------J > > RFC 4426 defines a master/slave relationship
for the > endpoints of the recovery LSPs. > - For the
recovery LSP B-K-L-F, either B or F will be the > master, controlling
switchover onto that recovery LSP. > - For the recovery LSP
C-I-J-G, either C or G will be the > master, controlling switchover
onto that recovery LSP. > > However, there is no mechanism
defined for electing the > masters. Rather, RFC 4872 defines the
switchover procedures > for 1:1 protection, and states that the first
endpoint of the > recovery LSP to detect the failure is the one that
initiates > switchover (http://tools.ietf.org/html/rfc4872#section-7.2).
Where
is it stated in section 7.2 that "the first endpoint of the recovery LSP to
detect the failure is the one that initiates switchover" can you point me
the sentence ?
> In this example, C and F are closest to the
failure, and have > their NOTIFY_REQUEST objects at the top of the
stack at D and > E respectively. It is therefore likely that C
and F will > detect the failure before B and G, and so C and F will be
the masters. > > That presents the problem that C and F will both
attempt to > initiate a switchover using their respective recovery
LSPs, > leading to the data loss described in my original mail. >
> If there was a way to force B and C to be masters then your >
suggestion that B should avoid triggering protection > switching before
C may work. However, that is explicitly not > considered in 1:1
protection switching. See Note 1 in > http://tools.ietf.org/html/rfc4872#section-7.2: >
> Note 1: a 2-phase protection-switching signaling is
used in the > present context; a 3-phase signaling
(see [RFC4426]) that > would imply > a
notification message, a switchover request, and a
switchover > response messages is not considered
here.
Per 4426: "The determination of the master and the slave may be
based on configured information or protocol specific requirements." ...
so basically you may extend the protocol messaging detailed in 4872
to trigger this election but you can also perform it via other means.
The fundamental issue is that it does not modify the protocol
procedures specified in 4872 or 4873 i.e. dynamic election would just be an
add-on.
The same applies to the WTR where we stated specified by
configuration and then Attila came with a dynamic mechanism to set it up,
etc.
Thanks, -dimitri. > Nic > > >
> -----Original Message----- > From: ALU - Dimitri Papadimitriou
> Sent: 24 February 2009 09:27 > To: Nic Neate; ccamp@ops.ietf.org > Cc: labn - Lou
Berger; IBryskin@advaoptical.com; Aria -
> Adrian Farrel Personal > Subject: RE: Segment protection
failure when recovery LSPs overlap > > Nic, > > I
will restate, in all protection scheme there is a master > slave
mechanism. Now concerning the SRRO: C (and B) and F > (and G) are
generators in the upstream and downstream > direction. So the SRRO are
known to B and it is what we are > interested in that B does not
trigger recovery before C and > the same for F and G i.e that G does
not trigger recovery before F. > > Thanks, >
-dimitri. > > > -----Original Message----- > > From:
Nic Neate [mailto:Nic.Neate@dataconnection.com] > > Sent: Monday,
February 23, 2009 4:13 PM > > To: PAPADIMITRIOU Dimitri; ccamp@ops.ietf.org > > Cc: labn
- Lou Berger; IBryskin@advaoptical.com; Aria -
> Adrian Farrel > > Personal > > Subject: RE:
Segment protection failure when recovery LSPs overlap > > >
> Hi Dimitri, > > > > We had wondered about the SRRO as
a possible solution to > this problem > > as well.
However, there are a couple of issues as the protocol > > currently
stands. > > > > - SRROs can only be present in Path
messages between the > merge node > > and the egress, and in
Resv messages between the branch > node and the > >
ingress. See > > http://tools.ietf.org/html/rfc4873#section-2
and > > http://tools.ietf.org/html/rfc4873#section-5.2.
Therefore, > C does not > > have the SRRO for recovery LSP
B-K-L-F, and F does not have > the SRRO > > for recovery LSP
C-I-J-G. > > > > - The inclusion of the SRRO is
optional, controlled via the > > segment-recording-desired flag in
the SESSION_ATTRIBUTE object > > (http://tools.ietf.org/html/rfc4873#section-5.2).
If the SRRO is > > required in order to avoid data loss then it
needs to be mandatory. > > > > So I think we need a
protocol extension in order to provide a > > signaling-based
solution. > > > > Nic > > > > >
> -----Original Message----- > > From: ALU - Dimitri
Papadimitriou > > Sent: 21 February 2009 23:51 > > To: Nic
Neate; ccamp@ops.ietf.org > >
Cc: labn - Lou Berger; IBryskin@advaoptical.com; Aria -
> Adrian Farrel > > Personal > > Subject: RE:
Segment protection failure when recovery LSPs overlap > > >
> Nic, > > > > RFC4873 by means of SRRO allows nodes to
determine existence of > > upstream/downstream recovery segments as
carried in > Path/Resv message. > > Combined with RFC4426 and
RFC4428 that refers to master/slave it > > results that either C (or
F) trigger a recovery action by means of > > disjoint recovery
segments. > > > > Thanks, > > -d. > >
> > > -----Original Message----- > > > From: Nic
Neate [mailto:Nic.Neate@dataconnection.com] > > > Sent: Friday,
February 20, 2009 4:05 PM > > > To: ccamp@ops.ietf.org > > > Cc:
labn - Lou Berger; IBryskin@advaoptical.com;
PAPADIMITRIOU > > > Dimitri; Aria - Adrian Farrel
Personal > > > Subject: Segment protection failure when recovery
LSPs overlap > > > > > > Hi CCAMP, > > >
> > > I'd like to raise one more issue with RFC4873
segment > > recovery, which > > > I believe will lead to
data loss when overlapping segment > recovery > > > LSPs
are used. > > > > > > RFC4873 allows topologies like
this one: > > > > >
>
K-----------L > >
>
/
\ > >
>
A===B===C===D===E===F===G===H > >
>
\
/ > >
>
I-----------J > > > > > > A working LSP
A-B-C-D-E-F-G-H is protected by two > > overlapping segment >
> > recovery LSPs: B-K-L-F and C-I-J-G. The recovery scheme is 1:1
> > > protection with extra traffic. > > > >
> > Suppose the link D-E fails: > > > > >
>
K-----------L > >
>
/
\ > >
>
A===B===C===D x E===F===G===H > >
>
\
/ > >
>
I-----------J > > > > > > My understanding is that
the failure will be handled as follows. > > > > > >
- D detects the link failure, and sends Notify to C > (first
Notify > > > object > > > in the
received Path). C and G exchanged Notify > > messages to
remove > > > extra traffic from the C-I-J-G
repair, and then send > and receive > >
> traffic from the working LSP on C-I and G-J. >
> > > > > - Meanwhile, E also detects the failure,
and sends Notify > > to F (first > > >
Notify object in the received Resv). F likewise > exchanges
Notify > > > messages with B to remove extra
traffic from the > B-K-L-F repair, > > > and > >
> and then send and receive working LSP traffic on B-K
and F-L. > > > > > > That results in the following
data flow: > > > > >
>
K----->-----L > >
>
/
\ > >
>
A->-B <-C D E F->
G<--H > >
>
\
/ > >
>
I-----<-----J > > > > > > Forward traffic reaches
G on the link F-G. However, G has > > switched to > >
> send and receive on G-J, and so drops traffic received from F. >
> > > > > Reverse traffic reaches B on C-B. However,
B has switched > > to send and > > > receive on B-K, and
so drops traffic received from C. > > > > > > Thus
traffic is lost in both directions. > > > > > > Can
anyone point out an error in this analysis? Is this > a topology
> > > that there is interest in supporting? > > >
> > > Thanks, > > > > > > Nic >
> > > > >
|