[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Two Drafts for Resilience of Control Plane

To: <ibryskin@movaz.com>, "Drake, John E" <John.E.Drake2@boeing.com>
Subject: RE: Two Drafts for Resilience of Control Plane
From: "Zafar Ali \(zali\)" <zali@cisco.com>
Date: Sat, 29 Oct 2005 09:53:57 -0400
Cc: <dpapadimitriou@psg.com>, <dimitri.papadimitriou@alcatel.be>, "Igor Bryskin" <i_bryskin@yahoo.com>, "Kim Young Hwa" <yhwkim@etri.re.kr>, <ccamp@ops.ietf.org>
> -----Original Message-----
> From: ibryskin@movaz.com [mailto:ibryskin@movaz.com] 
> Sent: Saturday, October 29, 2005 9:38 AM
> To: Drake, John E
> Cc: ibryskin@movaz.com; dpapadimitriou@psg.com; 
> dimitri.papadimitriou@alcatel.be; Igor Bryskin; Zafar Ali 
> (zali); Kim Young Hwa; ccamp@ops.ietf.org
> Subject: RE: Two Drafts for Resilience of Control Plane
> 
> John,
> 
> See in line.
> 
> Igor
> 
> > Igor,
> >
> > What you wrote was:
> >
> > "Suppose one or more signaling controllers managing some 
> LSP went out 
> > of service leaving the LSP's data plane intact. As far as 
> the user is 
> > concerned such LSP is perfectly healthy and operational.  Such 
> > situation could last for a considerable period of time."
> >
> > What part of this is *not* handled by RSVP graceful restart?
> >
> > In your subsequent e-mail, you then changed the problem 
> statement to:
> >
> > ""Dead" controllers in my example *do not* come back for a 
> > considerable period of time. So there are no restarts here 
> (graceful 
> > or not graceful)"
> 
> Sorry, I don't see how I have changed the problem statement. 
> I was and am saying that while controllers are out of service 
> for a considerable time (day? two days?  week?) the question 
> is what to do with active LSPs associated with them? Let's 
> consider an example:
> 
> 
> A----B------C-----D
> }                 |
> E-----F-----H-----K
> 
> Suppose we have an LSP A-B-C-D carrying user traffic and a 
> controller managing node B went out of service. The question 
> is what to do with this LSP until the controller comes back? 
> The operator may decide:
> a)	simply not wait and delete the LSP. Normal LSP teardown 
> - PathTear
> originated on the ingress controller- won't work because 
> PathTear won't make it to controllers managing nodes C and D, 
> leaving (very expensive in the optical layer) resources 
> associated with the LSP allocated and not available for other LSPs;
> b)	reroute via mb4b the LSP onto alternative path 
> A-E-H-K-D -won't work
> for the same reason as in a)
> c)	leave LSP as it is and wait for the dead controller to 
> be replaced or
> repaired. This would mean the need to perform normal 
> operations like, for example, monitoring of data plane 
> alarms, changing LSP admin status (for example, disabling 
> alarms on all nodes), perform power monitoring and 
> equalization, perform recovery operation in case of a fatal 
> data plane failure. All what depends on hop-by-hop signaling 
> won't work today.
> Don't tell me that these problems are fabricated; they are 
> real because they are raised by the customers. Dimitri seems 
> to understand the problem but he is saying that the CP in 
> this case is hardly of any use. This IMO is a dangerous 
> statement for the future of CP in non-packet environments.
> The Management plane aficionados will jump on it and say that 
> management plane does not have such a problem - NMS has a 
> direct access to any NE on the network, so it can do all 
> necessary cleanup no matter what happened.
> Customers will say: "Well, if there are situations when CP 
> suddenly becomes useless and we have to use management plane 
> anyway, why would we use the CP in the first place?'
> 
> Fortunately, I believe that the problems could be solved 
> entirely via CP by making it more resilient. Hence, CP 
> resilience is a good direction to work on within CCAMP WG

Igor, 

W.r.t. option C, please note that traffic CANNOT be forwarded in a
"head-less mode" for a very long time . If you control network melts or
a peering controller goes down, either RSVP GR or refreshes will take
care of the clean-up of the affected RSVP states. Similarly LMP CC SM
will go down (after states are cleared, i.e., degraded-to-down),
eventually removing the TE links from topology.  

Thanks

Regards... Zafar 

> 
> Igor
> 
> > If "Considerable period of time" is not equal to infinity, 
> then there 
> > will be an RSVP graceful restart.  If a controller is 
> really and truly 
> > dead, then presumably the operator will either replace it 
> or re-assign 
> > its data-plane resources to another signaling controller.  
> In either 
> > case, there will then be an RSVP graceful restart.
> >
> > Thanks,
> >
> > John
> >
> >
> >
> >> -----Original Message-----
> >> From: ibryskin@movaz.com [mailto:ibryskin@movaz.com]
> >> Sent: Friday, October 28, 2005 1:00 PM
> >> To: Drake, John E
> >> Cc: ibryskin@movaz.com; dpapadimitriou@psg.com; 
> >> dimitri.papadimitriou@alcatel.be; Igor Bryskin; Zafar Ali; 
> Kim Young
> > Hwa;
> >> ccamp@ops.ietf.org
> >> Subject: RE: Two Drafts for Resilience of Control Plane
> >>
> >> John,
> >>
> >> I think you missed my point here. "Dead" controllers in my example 
> >> *do
> >> not* come back for a considerable period of time. So there are no
> > restarts
> >> here (graceful or not graceful) :=)
> >>
> >> Igor
> >>
> >> > What part of your problem, as stated below, is not 
> handled by RSVP 
> >> > graceful restart?
> >> >
> >> >> -----Original Message-----
> >> >> From: ibryskin@movaz.com [mailto:ibryskin@movaz.com]
> >> >> Sent: Friday, October 28, 2005 11:41 AM
> >> >> To: Drake, John E
> >> >> Cc: dpapadimitriou@psg.com; 
> dimitri.papadimitriou@alcatel.be; Igor 
> >> >> Bryskin; Zafar Ali; Kim Young Hwa; ccamp@ops.ietf.org
> >> >> Subject: RE: Two Drafts for Resilience of Control Plane
> >> >>
> >> >> Hi,
> >> >>
> >> >> Here is one of the problems that I've been thinking for 
> a while -
> >> > control
> >> >> plane partitioned LSPs. Suppose one or more signaling 
> controllers
> >> > managing
> >> >> some LSP went out of service leaving the LSP's data 
> plane intact.
> > As
> >> > far
> >> >> as the user is concerned such LSP is perfectly healthy and
> >> > operational.
> >> >> Such situation could last for a considerable period of 
> time. Do we
> >> > need to
> >> >> manage such LSP via control plane? Sure, we must be capable to 
> >> >> tear
> >> > down
> >> >> such LSP, perform mb4b rerouting, distribute alarms between
> >> > operational
> >> >> controllers, signal data plane faults and perform recovery
> > switchover,
> >> >> modify LSP status, etc. Can we do this today? No, but with some
> >> >> (signaling) extensions the problem I believe is 
> solvable. Is this
> > some
> >> >> artificial, "fabricated" problem? No, I think it is 
> real. Does it
> > fall
> >> >> under the control plane resilience problem space? I believe it
> > does.
> >> >>
> >> >> Igor
> >> >>
> >> >> > I agree with Zafar and Dimitri.  If someone wanted to document
> > the
> >> > GMPLS
> >> >> > control plane resiliency features, as was done for GMPLS
> > addressing,
> >> >> > that might be a useful activity.
> >> >> >
> >> >> >> -----Original Message-----
> >> >> >> From: dimitri papadimitriou [mailto:dpapadimitriou@psg.com]
> >> >> >> Sent: Friday, October 28, 2005 9:56 AM
> >> >> >> To: Igor Bryskin
> >> >> >> Cc: Zafar Ali (zali); Kim Young Hwa; ccamp@ops.ietf.org
> >> >> >> Subject: Re: Two Drafts for Resilience of Control Plane
> >> >> >>
> >> >> >> igor -
> >> >> >>
> >> >> >> over time CCAMP came with a set of mechanims to 
> improve control
> >> > plane
> >> >> >> resilience (RSVP and LMP GR upon channel/node 
> failure) other WG
> >> >> > protocol
> >> >> >> work are also usable used here OSPF GR, etc. ... on the other
> > side,
> >> >> >> mechanism such as link bundling have built-in resilience
> >> > capabilities
> >> >> >> and most GMPLS control plane capabilities have been designed
> > such
> >> > as
> >> >> > to
> >> >> >> be independent of the control plane realisation (in-band,
> >> > out-of-band,
> >> >> >> etc.)
> >> >> >>
> >> >> >> so indeed i share the concern of Zafar what could we do more
> > here
> >> > than
> >> >> >> document these tools and provide our experience in 
> using them;
> >> >> >>
> >> >> >> now, before stating there are (potential) 
> problems(s) arising -
> >> > would
> >> >> >> you please be more specific on what are these potential 
> >> >> >> issue(s)
> >> >> > and/or
> >> >> >> problems ? (not related to policy/config. - note: all the 
> >> >> >> issues
> >> > you
> >> >> >> have pointed here below are simply policy/config specific but
> > none
> >> > of
> >> >> >> them highlights a missing IP control plane 
> resiliency feature)
> >> >> >>
> >> >> >> thanks,
> >> >> >> - dimitri.
> >> >> >>
> >> >> >>
> >> >> >> Igor Bryskin wrote:
> >> >> >>
> >> >> >> > Zafar,
> >> >> >> >
> >> >> >> > The problem arises when the control plane is 
> decoupled from 
> >> >> >> > the data plane. The question is do we need such 
> decoupling in 
> >> >> >> > IP networks? Consider, for example, the situation when 
> >> >> >> > several parallel PSC data links bundled together and 
> >> >> >> > controlled by a single control channel.
> >> >> >> > Does it mean in this case that when the control 
> channel fails 
> >> >> >> > all associated data links also fail? Do we need to 
> reroute in 
> >> >> >> > this case LSPs that use the data links? Can we 
> rely in this 
> >> >> >> > case on control plane indications to decide whether an 
> >> >> >> > associated data link is healthy or not (in other 
> words, can 
> >> >> >> > we rely on RSVP Hellos or should we use, for 
> example, BTD)? 
> >> >> >> > Should we be capable to recover control channels without 
> >> >> >> > disturbing data plane? I think control plane resilience is 
> >> >> >> > important for all layers. You are right, Internet 
> does work, 
> >> >> >> > however, we do need for some reason TE and (fast) 
> recovery in 
> >> >> >> > IP as much as in other layers,don't we?
> >> >> >> >
> >> >> >> > Cheers,
> >> >> >> > Igor
> >> >> >> >
> >> >> >> > --- "Zafar Ali (zali)" <zali@cisco.com> wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> >>Hi All,
> >> >> >> >>
> >> >> >> >>I am unable to understand the problem we are 
> trying to solve 
> >> >> >> >>or fabricate. My control network is IP based and IP has 
> >> >> >> >>proven resiliency (Internet *does* work), why 
> would I like to 
> >> >> >> >>take control plan resiliency problem at a layer *above-IP* 
> >> >> >> >>and complicate my life. Did I miss something?
> >> >> >> >>
> >> >> >> >>Thanks
> >> >> >> >>
> >> >> >> >>Regards... Zafar
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>________________________________
> >> >> >> >>
> >> >> >> >>	From: owner-ccamp@ops.ietf.org 
> >> >> >> >>[mailto:owner-ccamp@ops.ietf.org]
> >> >> >> >>On Behalf Of Kim Young Hwa
> >> >> >> >>	Sent: Friday, October 28, 2005 6:04 AM
> >> >> >> >>	To: ccamp@ops.ietf.org
> >> >> >> >>	Subject: Two Drafts for Resilience of Control Plane
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>	Dear all,
> >> >> >> >>
> >> >> >> >>	I posted two drafts for the resilience of control plane.
> >> >> >> >>	One is for requirements of the resilience of 
> control plane, 
> >> >> >> >>the other is for a protocol specification as a solution of 
> >> >> >> >>that .
> >> >> >> >>	These are now available at:
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> > 
> http://www.ietf.org/internet-drafts/draft-kim-ccamp-cpr-reqts-01.tx
> >> > t
> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >
> > 
> http://www.ietf.org/internet-drafts/draft-kim-ccamp-accp-protocol-00.t
> > xt
> >> >> >> >
> >> >> >> >>
> >> >> >> >>	I want your comments.
> >> >> >> >>
> >> >> >> >>	Regards
> >> >> >> >>
> >> >> >> >>	Young.
> >> >> >> >>
> >> >> >> >>	===================================> >>	Young-Hwa Kim
> >> >> >> >>	Principal Member / Ph.D
> >> >> >> >>	BcN Research Division, ETRI
> >> >> >> >>	Tel:     +82-42-860-5819
> >> >> >> >>	Fax:    +82-42-860-5440
> >> >> >> >>	e-mail: yhwkim@etri.re.kr
> >> >> >> >>	===================================> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >
> > 
> <http://umail.etri.re.kr/External_ReadCheck.aspx?email=ccamp@ops.ietf.
> > or
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >
> > 
> g&name=ccamp%40ops.ietf.org&fromemail=yhwkim@etri.re.kr&messageid=%3C8
> > 63
> >> >> >> >
> >> >> >> >>0a6db-0c31-49ab-a798-13b0dda04553@etri.re.kr%3E>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > __________________________________
> >> >> >> > Yahoo! Mail - PC Magazine Editors' Choice 2005 
> >> >> >> > http://mail.yahoo.com
> >> >> >> >
> >> >> >> >
> >> >> >> > .
> >> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >
>
Follow-Ups:
- RE: Two Drafts for Resilience of Control Plane
  - From: ibryskin@movaz.com
Prev by Date: RE: Two Drafts for Resilience of Control Plane
Next by Date: RE: Two Drafts for Resilience of Control Plane
Previous by thread: RE: Two Drafts for Resilience of Control Plane
Next by thread: RE: Two Drafts for Resilience of Control Plane
Index(es):
- Date
- Thread