[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Two Drafts for Resilience of Control Plane



Igor,

You haven't convinced me that there is a real problem here that is not
addressed by the combination of the existing GMPLS resilience mechanisms
and a robust implementation of those mechanisms.

As I said yesterday, an I-D that enumerated all of the resilience
mechanisms might be a useful thing, specifically as a primer.

Thanks,

John

> -----Original Message-----
> From: ibryskin@movaz.com [mailto:ibryskin@movaz.com]
> Sent: Saturday, October 29, 2005 6:38 AM
> To: Drake, John E
> Cc: ibryskin@movaz.com; dpapadimitriou@psg.com;
> dimitri.papadimitriou@alcatel.be; Igor Bryskin; Zafar Ali; Kim Young
Hwa;
> ccamp@ops.ietf.org
> Subject: RE: Two Drafts for Resilience of Control Plane
> 
> John,
> 
> See in line.
> 
> Igor
> 
> > Igor,
> >
> > What you wrote was:
> >
> > "Suppose one or more signaling controllers managing some LSP went
out of
> > service leaving the LSP's data plane intact. As far as the user is
> > concerned such LSP is perfectly healthy and operational.  Such
situation
> > could last for a considerable period of time."
> >
> > What part of this is *not* handled by RSVP graceful restart?
> >
> > In your subsequent e-mail, you then changed the problem statement
to:
> >
> > ""Dead" controllers in my example *do not* come back for a
considerable
> > period of time. So there are no restarts here (graceful or not
> > graceful)"
> 
> Sorry, I don't see how I have changed the problem statement. I was and
am
> saying that while controllers are out of service for a considerable
time
> (day? two days?  week?) the question is what to do with active LSPs
> associated with them? Let's consider an example:
> 
> 
> A----B------C-----D
> }                 |
> E-----F-----H-----K
> 
> Suppose we have an LSP A-B-C-D carrying user traffic and a controller
> managing node B went out of service. The question is what to do with
this
> LSP until the controller comes back? The operator may decide:
> a)	simply not wait and delete the LSP. Normal LSP teardown -
PathTear
> originated on the ingress controller- won't work because PathTear
won't
> make it to controllers managing nodes C and D, leaving (very expensive
in
> the optical layer) resources associated with the LSP allocated and not
> available for other LSPs;
> b)	reroute via mb4b the LSP onto alternative path A-E-H-K-D -won't
work
> for the same reason as in a)
> c)	leave LSP as it is and wait for the dead controller to be
replaced
> or
> repaired. This would mean the need to perform normal operations like,
for
> example, monitoring of data plane alarms, changing LSP admin status
(for
> example, disabling alarms on all nodes), perform power monitoring and
> equalization, perform recovery operation in case of a fatal data plane
> failure. All what depends on hop-by-hop signaling won't work today.
> Don't tell me that these problems are fabricated; they are real
because
> they are raised by the customers. Dimitri seems to understand the
problem
> but he is saying that the CP in this case is hardly of any use. This
IMO
> is a dangerous statement for the future of CP in non-packet
environments.
> The Management plane aficionados will jump on it and say that
management
> plane does not have such a problem - NMS has a direct access to any NE
on
> the network, so it can do all necessary cleanup no matter what
happened.
> Customers will say: "Well, if there are situations when CP suddenly
> becomes useless and we have to use management plane anyway, why would
we
> use the CP in the first place?'
> 
> Fortunately, I believe that the problems could be solved entirely via
CP
> by making it more resilient. Hence, CP resilience is a good direction
to
> work on within CCAMP WG
> 
> Igor
> 
> > If "Considerable period of time" is not equal to infinity, then
there
> > will be an RSVP graceful restart.  If a controller is really and
truly
> > dead, then presumably the operator will either replace it or
re-assign
> > its data-plane resources to another signaling controller.  In either
> > case, there will then be an RSVP graceful restart.
> >
> > Thanks,
> >
> > John
> >
> >
> >
> >> -----Original Message-----
> >> From: ibryskin@movaz.com [mailto:ibryskin@movaz.com]
> >> Sent: Friday, October 28, 2005 1:00 PM
> >> To: Drake, John E
> >> Cc: ibryskin@movaz.com; dpapadimitriou@psg.com;
> >> dimitri.papadimitriou@alcatel.be; Igor Bryskin; Zafar Ali; Kim
Young
> > Hwa;
> >> ccamp@ops.ietf.org
> >> Subject: RE: Two Drafts for Resilience of Control Plane
> >>
> >> John,
> >>
> >> I think you missed my point here. "Dead" controllers in my example
*do
> >> not* come back for a considerable period of time. So there are no
> > restarts
> >> here (graceful or not graceful) :=)
> >>
> >> Igor
> >>
> >> > What part of your problem, as stated below, is not handled by
RSVP
> >> > graceful restart?
> >> >
> >> >> -----Original Message-----
> >> >> From: ibryskin@movaz.com [mailto:ibryskin@movaz.com]
> >> >> Sent: Friday, October 28, 2005 11:41 AM
> >> >> To: Drake, John E
> >> >> Cc: dpapadimitriou@psg.com; dimitri.papadimitriou@alcatel.be;
Igor
> >> >> Bryskin; Zafar Ali; Kim Young Hwa; ccamp@ops.ietf.org
> >> >> Subject: RE: Two Drafts for Resilience of Control Plane
> >> >>
> >> >> Hi,
> >> >>
> >> >> Here is one of the problems that I've been thinking for a while
-
> >> > control
> >> >> plane partitioned LSPs. Suppose one or more signaling
controllers
> >> > managing
> >> >> some LSP went out of service leaving the LSP's data plane
intact.
> > As
> >> > far
> >> >> as the user is concerned such LSP is perfectly healthy and
> >> > operational.
> >> >> Such situation could last for a considerable period of time. Do
we
> >> > need to
> >> >> manage such LSP via control plane? Sure, we must be capable to
tear
> >> > down
> >> >> such LSP, perform mb4b rerouting, distribute alarms between
> >> > operational
> >> >> controllers, signal data plane faults and perform recovery
> > switchover,
> >> >> modify LSP status, etc. Can we do this today? No, but with some
> >> >> (signaling) extensions the problem I believe is solvable. Is
this
> > some
> >> >> artificial, "fabricated" problem? No, I think it is real. Does
it
> > fall
> >> >> under the control plane resilience problem space? I believe it
> > does.
> >> >>
> >> >> Igor
> >> >>
> >> >> > I agree with Zafar and Dimitri.  If someone wanted to document
> > the
> >> > GMPLS
> >> >> > control plane resiliency features, as was done for GMPLS
> > addressing,
> >> >> > that might be a useful activity.
> >> >> >
> >> >> >> -----Original Message-----
> >> >> >> From: dimitri papadimitriou [mailto:dpapadimitriou@psg.com]
> >> >> >> Sent: Friday, October 28, 2005 9:56 AM
> >> >> >> To: Igor Bryskin
> >> >> >> Cc: Zafar Ali (zali); Kim Young Hwa; ccamp@ops.ietf.org
> >> >> >> Subject: Re: Two Drafts for Resilience of Control Plane
> >> >> >>
> >> >> >> igor -
> >> >> >>
> >> >> >> over time CCAMP came with a set of mechanims to improve
control
> >> > plane
> >> >> >> resilience (RSVP and LMP GR upon channel/node failure) other
WG
> >> >> > protocol
> >> >> >> work are also usable used here OSPF GR, etc. ... on the other
> > side,
> >> >> >> mechanism such as link bundling have built-in resilience
> >> > capabilities
> >> >> >> and most GMPLS control plane capabilities have been designed
> > such
> >> > as
> >> >> > to
> >> >> >> be independent of the control plane realisation (in-band,
> >> > out-of-band,
> >> >> >> etc.)
> >> >> >>
> >> >> >> so indeed i share the concern of Zafar what could we do more
> > here
> >> > than
> >> >> >> document these tools and provide our experience in using
them;
> >> >> >>
> >> >> >> now, before stating there are (potential) problems(s) arising
-
> >> > would
> >> >> >> you please be more specific on what are these potential
issue(s)
> >> >> > and/or
> >> >> >> problems ? (not related to policy/config. - note: all the
issues
> >> > you
> >> >> >> have pointed here below are simply policy/config specific but
> > none
> >> > of
> >> >> >> them highlights a missing IP control plane resiliency
feature)
> >> >> >>
> >> >> >> thanks,
> >> >> >> - dimitri.
> >> >> >>
> >> >> >>
> >> >> >> Igor Bryskin wrote:
> >> >> >>
> >> >> >> > Zafar,
> >> >> >> >
> >> >> >> > The problem arises when the control plane is decoupled
> >> >> >> > from the data plane. The question is do we need such
> >> >> >> > decoupling in IP networks? Consider, for example, the
> >> >> >> > situation when several parallel PSC data links bundled
> >> >> >> > together and controlled by a single control channel.
> >> >> >> > Does it mean in this case that when the control
> >> >> >> > channel fails all associated data links also fail? Do
> >> >> >> > we need to reroute in this case LSPs that use the data
> >> >> >> > links? Can we rely in this case on control plane
> >> >> >> > indications to decide whether an associated data link
> >> >> >> > is healthy or not (in other words, can we rely on RSVP
> >> >> >> > Hellos or should we use, for example, BTD)? Should we
> >> >> >> > be capable to recover control channels without
> >> >> >> > disturbing data plane? I think control plane
> >> >> >> > resilience is important for all layers. You are right,
> >> >> >> > Internet does work, however, we do need for some
> >> >> >> > reason TE and (fast) recovery in IP as much as in
> >> >> >> > other layers,don't we?
> >> >> >> >
> >> >> >> > Cheers,
> >> >> >> > Igor
> >> >> >> >
> >> >> >> > --- "Zafar Ali (zali)" <zali@cisco.com> wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> >>Hi All,
> >> >> >> >>
> >> >> >> >>I am unable to understand the problem we are trying
> >> >> >> >>to solve or
> >> >> >> >>fabricate. My control network is IP based and IP has
> >> >> >> >>proven resiliency
> >> >> >> >>(Internet *does* work), why would I like to take
> >> >> >> >>control plan resiliency
> >> >> >> >>problem at a layer *above-IP* and complicate my
> >> >> >> >>life. Did I miss
> >> >> >> >>something?
> >> >> >> >>
> >> >> >> >>Thanks
> >> >> >> >>
> >> >> >> >>Regards... Zafar
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>________________________________
> >> >> >> >>
> >> >> >> >>	From: owner-ccamp@ops.ietf.org
> >> >> >> >>[mailto:owner-ccamp@ops.ietf.org]
> >> >> >> >>On Behalf Of Kim Young Hwa
> >> >> >> >>	Sent: Friday, October 28, 2005 6:04 AM
> >> >> >> >>	To: ccamp@ops.ietf.org
> >> >> >> >>	Subject: Two Drafts for Resilience of Control Plane
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>	Dear all,
> >> >> >> >>
> >> >> >> >>	I posted two drafts for the resilience of control
> >> >> >> >>plane.
> >> >> >> >>	One is for requirements of the resilience of
> >> >> >> >>control plane, the
> >> >> >> >>other is for a protocol specification as a solution
> >> >> >> >>of that .
> >> >> >> >>	These are now available at:
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >
http://www.ietf.org/internet-drafts/draft-kim-ccamp-cpr-reqts-01.txt
> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >
> >
http://www.ietf.org/internet-drafts/draft-kim-ccamp-accp-protocol-00.txt
> >> >> >> >
> >> >> >> >>
> >> >> >> >>	I want your comments.
> >> >> >> >>
> >> >> >> >>	Regards
> >> >> >> >>
> >> >> >> >>	Young.
> >> >> >> >>
> >> >> >> >>	===================================> >>	Young-Hwa Kim
> >> >> >> >>	Principal Member / Ph.D
> >> >> >> >>	BcN Research Division, ETRI
> >> >> >> >>	Tel:     +82-42-860-5819
> >> >> >> >>	Fax:    +82-42-860-5440
> >> >> >> >>	e-mail: yhwkim@etri.re.kr
> >> >> >> >>	===================================> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >
> >
<http://umail.etri.re.kr/External_ReadCheck.aspx?email=ccamp@ops.ietf.or
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >
> >
g&name=ccamp%40ops.ietf.org&fromemail=yhwkim@etri.re.kr&messageid=%3C863
> >> >> >> >
> >> >> >> >>0a6db-0c31-49ab-a798-13b0dda04553@etri.re.kr%3E>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > __________________________________
> >> >> >> > Yahoo! Mail - PC Magazine Editors' Choice 2005
> >> >> >> > http://mail.yahoo.com
> >> >> >> >
> >> >> >> >
> >> >> >> > .
> >> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >