[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Two Drafts for Resilience of Control Plane

To: "Drake, John E" <John.E.Drake2@boeing.com>
Subject: RE: Two Drafts for Resilience of Control Plane
From: ibryskin@movaz.com
Date: Sat, 29 Oct 2005 09:37:55 -0400 (EDT)
Cc: ibryskin@movaz.com, dpapadimitriou@psg.com, dimitri.papadimitriou@alcatel.be, "Igor Bryskin" <i_bryskin@yahoo.com>, "Zafar Ali" <zali@cisco.com>, "Kim Young Hwa" <yhwkim@etri.re.kr>, ccamp@ops.ietf.org
In-reply-to: <626FC7C6A97381468FB872072AB5DDC8369701@XCH-SW-42.sw.nos.boeing.com>
References: <626FC7C6A97381468FB872072AB5DDC8369701@XCH-SW-42.sw.nos.boeing.com>
User-agent: SquirrelMail/1.4.1

John,

See in line.

Igor

> Igor,
>
> What you wrote was:
>
> "Suppose one or more signaling controllers managing some LSP went out of
> service leaving the LSP's data plane intact. As far as the user is
> concerned such LSP is perfectly healthy and operational.  Such situation
> could last for a considerable period of time."
>
> What part of this is *not* handled by RSVP graceful restart?
>
> In your subsequent e-mail, you then changed the problem statement to:
>
> ""Dead" controllers in my example *do not* come back for a considerable
> period of time. So there are no restarts here (graceful or not
> graceful)"

Sorry, I don?t see how I have changed the problem statement. I was and am
saying that while controllers are out of service for a considerable time
(day? two days?  week?) the question is what to do with active LSPs
associated with them? Let?s consider an example:


A----B------C-----D
}                 |
E-----F-----H-----K

Suppose we have an LSP A-B-C-D carrying user traffic and a controller
managing node B went out of service. The question is what to do with this
LSP until the controller comes back? The operator may decide:
a)	simply not wait and delete the LSP. Normal LSP teardown ? PathTear
originated on the ingress controller- won?t work because PathTear won?t
make it to controllers managing nodes C and D, leaving (very expensive in
the optical layer) resources associated with the LSP allocated and not
available for other LSPs;
b)	reroute via mb4b the LSP onto alternative path A-E-H-K-D ?won?t work
for the same reason as in a)
c)	leave LSP as it is and wait for the dead controller to be replaced or
repaired. This would mean the need to perform normal operations like, for
example, monitoring of data plane alarms, changing LSP admin status (for
example, disabling alarms on all nodes), perform power monitoring and
equalization, perform recovery operation in case of a fatal data plane
failure. All what depends on hop-by-hop signaling won?t work today.
Don?t tell me that these problems are fabricated; they are real because
they are raised by the customers. Dimitri seems to understand the problem
but he is saying that the CP in this case is hardly of any use. This IMO
is a dangerous statement for the future of CP in non-packet environments.
The Management plane aficionados will jump on it and say that management
plane does not have such a problem ? NMS has a direct access to any NE on
the network, so it can do all necessary cleanup no matter what happened.
Customers will say: ?Well, if there are situations when CP suddenly
becomes useless and we have to use management plane anyway, why would we
use the CP in the first place??

Fortunately, I believe that the problems could be solved entirely via CP
by making it more resilient. Hence, CP resilience is a good direction to
work on within CCAMP WG

Igor

> If "Considerable period of time" is not equal to infinity, then there
> will be an RSVP graceful restart.  If a controller is really and truly
> dead, then presumably the operator will either replace it or re-assign
> its data-plane resources to another signaling controller.  In either
> case, there will then be an RSVP graceful restart.
>
> Thanks,
>
> John
>
>
>
>> -----Original Message-----
>> From: ibryskin@movaz.com [mailto:ibryskin@movaz.com]
>> Sent: Friday, October 28, 2005 1:00 PM
>> To: Drake, John E
>> Cc: ibryskin@movaz.com; dpapadimitriou@psg.com;
>> dimitri.papadimitriou@alcatel.be; Igor Bryskin; Zafar Ali; Kim Young
> Hwa;
>> ccamp@ops.ietf.org
>> Subject: RE: Two Drafts for Resilience of Control Plane
>>
>> John,
>>
>> I think you missed my point here. "Dead" controllers in my example *do
>> not* come back for a considerable period of time. So there are no
> restarts
>> here (graceful or not graceful) :=)
>>
>> Igor
>>
>> > What part of your problem, as stated below, is not handled by RSVP
>> > graceful restart?
>> >
>> >> -----Original Message-----
>> >> From: ibryskin@movaz.com [mailto:ibryskin@movaz.com]
>> >> Sent: Friday, October 28, 2005 11:41 AM
>> >> To: Drake, John E
>> >> Cc: dpapadimitriou@psg.com; dimitri.papadimitriou@alcatel.be; Igor
>> >> Bryskin; Zafar Ali; Kim Young Hwa; ccamp@ops.ietf.org
>> >> Subject: RE: Two Drafts for Resilience of Control Plane
>> >>
>> >> Hi,
>> >>
>> >> Here is one of the problems that I've been thinking for a while -
>> > control
>> >> plane partitioned LSPs. Suppose one or more signaling controllers
>> > managing
>> >> some LSP went out of service leaving the LSP's data plane intact.
> As
>> > far
>> >> as the user is concerned such LSP is perfectly healthy and
>> > operational.
>> >> Such situation could last for a considerable period of time. Do we
>> > need to
>> >> manage such LSP via control plane? Sure, we must be capable to tear
>> > down
>> >> such LSP, perform mb4b rerouting, distribute alarms between
>> > operational
>> >> controllers, signal data plane faults and perform recovery
> switchover,
>> >> modify LSP status, etc. Can we do this today? No, but with some
>> >> (signaling) extensions the problem I believe is solvable. Is this
> some
>> >> artificial, "fabricated" problem? No, I think it is real. Does it
> fall
>> >> under the control plane resilience problem space? I believe it
> does.
>> >>
>> >> Igor
>> >>
>> >> > I agree with Zafar and Dimitri.  If someone wanted to document
> the
>> > GMPLS
>> >> > control plane resiliency features, as was done for GMPLS
> addressing,
>> >> > that might be a useful activity.
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: dimitri papadimitriou [mailto:dpapadimitriou@psg.com]
>> >> >> Sent: Friday, October 28, 2005 9:56 AM
>> >> >> To: Igor Bryskin
>> >> >> Cc: Zafar Ali (zali); Kim Young Hwa; ccamp@ops.ietf.org
>> >> >> Subject: Re: Two Drafts for Resilience of Control Plane
>> >> >>
>> >> >> igor -
>> >> >>
>> >> >> over time CCAMP came with a set of mechanims to improve control
>> > plane
>> >> >> resilience (RSVP and LMP GR upon channel/node failure) other WG
>> >> > protocol
>> >> >> work are also usable used here OSPF GR, etc. ... on the other
> side,
>> >> >> mechanism such as link bundling have built-in resilience
>> > capabilities
>> >> >> and most GMPLS control plane capabilities have been designed
> such
>> > as
>> >> > to
>> >> >> be independent of the control plane realisation (in-band,
>> > out-of-band,
>> >> >> etc.)
>> >> >>
>> >> >> so indeed i share the concern of Zafar what could we do more
> here
>> > than
>> >> >> document these tools and provide our experience in using them;
>> >> >>
>> >> >> now, before stating there are (potential) problems(s) arising -
>> > would
>> >> >> you please be more specific on what are these potential issue(s)
>> >> > and/or
>> >> >> problems ? (not related to policy/config. - note: all the issues
>> > you
>> >> >> have pointed here below are simply policy/config specific but
> none
>> > of
>> >> >> them highlights a missing IP control plane resiliency feature)
>> >> >>
>> >> >> thanks,
>> >> >> - dimitri.
>> >> >>
>> >> >>
>> >> >> Igor Bryskin wrote:
>> >> >>
>> >> >> > Zafar,
>> >> >> >
>> >> >> > The problem arises when the control plane is decoupled
>> >> >> > from the data plane. The question is do we need such
>> >> >> > decoupling in IP networks? Consider, for example, the
>> >> >> > situation when several parallel PSC data links bundled
>> >> >> > together and controlled by a single control channel.
>> >> >> > Does it mean in this case that when the control
>> >> >> > channel fails all associated data links also fail? Do
>> >> >> > we need to reroute in this case LSPs that use the data
>> >> >> > links? Can we rely in this case on control plane
>> >> >> > indications to decide whether an associated data link
>> >> >> > is healthy or not (in other words, can we rely on RSVP
>> >> >> > Hellos or should we use, for example, BTD)? Should we
>> >> >> > be capable to recover control channels without
>> >> >> > disturbing data plane? I think control plane
>> >> >> > resilience is important for all layers. You are right,
>> >> >> > Internet does work, however, we do need for some
>> >> >> > reason TE and (fast) recovery in IP as much as in
>> >> >> > other layers,don't we?
>> >> >> >
>> >> >> > Cheers,
>> >> >> > Igor
>> >> >> >
>> >> >> > --- "Zafar Ali (zali)" <zali@cisco.com> wrote:
>> >> >> >
>> >> >> >
>> >> >> >>Hi All,
>> >> >> >>
>> >> >> >>I am unable to understand the problem we are trying
>> >> >> >>to solve or
>> >> >> >>fabricate. My control network is IP based and IP has
>> >> >> >>proven resiliency
>> >> >> >>(Internet *does* work), why would I like to take
>> >> >> >>control plan resiliency
>> >> >> >>problem at a layer *above-IP* and complicate my
>> >> >> >>life. Did I miss
>> >> >> >>something?
>> >> >> >>
>> >> >> >>Thanks
>> >> >> >>
>> >> >> >>Regards... Zafar
>> >> >> >>
>> >> >> >>
>> >> >> >>________________________________
>> >> >> >>
>> >> >> >>	From: owner-ccamp@ops.ietf.org
>> >> >> >>[mailto:owner-ccamp@ops.ietf.org]
>> >> >> >>On Behalf Of Kim Young Hwa
>> >> >> >>	Sent: Friday, October 28, 2005 6:04 AM
>> >> >> >>	To: ccamp@ops.ietf.org
>> >> >> >>	Subject: Two Drafts for Resilience of Control Plane
>> >> >> >>
>> >> >> >>
>> >> >> >>	Dear all,
>> >> >> >>
>> >> >> >>	I posted two drafts for the resilience of control
>> >> >> >>plane.
>> >> >> >>	One is for requirements of the resilience of
>> >> >> >>control plane, the
>> >> >> >>other is for a protocol specification as a solution
>> >> >> >>of that .
>> >> >> >>	These are now available at:
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> > http://www.ietf.org/internet-drafts/draft-kim-ccamp-cpr-reqts-01.txt
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >
>> >
> http://www.ietf.org/internet-drafts/draft-kim-ccamp-accp-protocol-00.txt
>> >> >> >
>> >> >> >>
>> >> >> >>	I want your comments.
>> >> >> >>
>> >> >> >>	Regards
>> >> >> >>
>> >> >> >>	Young.
>> >> >> >>
>> >> >> >>	===================================> >>	Young-Hwa Kim
>> >> >> >>	Principal Member / Ph.D
>> >> >> >>	BcN Research Division, ETRI
>> >> >> >>	Tel:     +82-42-860-5819
>> >> >> >>	Fax:    +82-42-860-5440
>> >> >> >>	e-mail: yhwkim@etri.re.kr
>> >> >> >>	===================================> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >
>> >
> <http://umail.etri.re.kr/External_ReadCheck.aspx?email=ccamp@ops.ietf.or
>> >> >> >
>> >> >> >
>> >> >
>> >
> g&name=ccamp%40ops.ietf.org&fromemail=yhwkim@etri.re.kr&messageid=%3C863
>> >> >> >
>> >> >> >>0a6db-0c31-49ab-a798-13b0dda04553@etri.re.kr%3E>
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > __________________________________
>> >> >> > Yahoo! Mail - PC Magazine Editors' Choice 2005
>> >> >> > http://mail.yahoo.com
>> >> >> >
>> >> >> >
>> >> >> > .
>> >> >> >
>> >> >
>> >> >
>> >> >
>> >
>> >
>
>

Follow-Ups:
- Re: Two Drafts for Resilience of Control Plane
  - From: dimitri papadimitriou <dpapadimitriou@psg.com>

References:
- RE: Two Drafts for Resilience of Control Plane
  - From: "Drake, John E" <John.E.Drake2@boeing.com>

Prev by Date: RE: Two Drafts for Resilience of Control Plane
Next by Date: RE: Two Drafts for Resilience of Control Plane
Previous by thread: RE: Two Drafts for Resilience of Control Plane
Next by thread: Re: Two Drafts for Resilience of Control Plane
Index(es):
- Date
- Thread