[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Two Drafts for Resilience of Control Plane

To: ibryskin@movaz.com
Subject: Re: Two Drafts for Resilience of Control Plane
From: dimitri papadimitriou <dpapadimitriou@psg.com>
Date: Sat, 29 Oct 2005 17:38:28 +0200
Cc: "Drake, John E" <John.E.Drake2@boeing.com>, dimitri.papadimitriou@alcatel.be, Igor Bryskin <i_bryskin@yahoo.com>, Zafar Ali <zali@cisco.com>, Kim Young Hwa <yhwkim@etri.re.kr>, ccamp@ops.ietf.org
In-reply-to: <3336.68.100.80.11.1130593075.squirrel@webmail.movaz.com>
References: <626FC7C6A97381468FB872072AB5DDC8369701@XCH-SW-42.sw.nos.boeing.com> <3336.68.100.80.11.1130593075.squirrel@webmail.movaz.com>
Reply-to: dpapadimitriou@psg.com, dimitri.papadimitriou@alcatel.be
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.11) Gecko/20050728

igor - see in-line

Igor,

What you wrote was:

"Suppose one or more signaling controllers managing some LSP went out of
service leaving the LSP's data plane intact. As far as the user is
concerned such LSP is perfectly healthy and operational.  Such situation
could last for a considerable period of time."

What part of this is *not* handled by RSVP graceful restart?

In your subsequent e-mail, you then changed the problem statement to:

""Dead" controllers in my example *do not* come back for a considerable
period of time. So there are no restarts here (graceful or not
graceful)"



Sorry, I don’t see how I have changed the problem statement. I was and am
saying that while controllers are out of service for a considerable time
(day? two days?  week?) the question is what to do with active LSPs

associated with them?

you should take this as the initial problem - is there something wrongwith that situation as transient ? and is the objective leave B dead foran non-finite period of time because in *any case* you will have toreplace B to be able to provision new LSPs crossing this node ?

Let’s consider an example:


A----B------C-----D
}                 |
E-----F-----H-----K

Suppose we have an LSP A-B-C-D carrying user traffic and a controller
managing node B went out of service. The question is what to do with this
LSP until the controller comes back? The operator may decide:
a)	simply not wait and delete the LSP. Normal LSP teardown – PathTear
originated on the ingress controller- won’t work because PathTear won’t
make it to controllers managing nodes C and D, leaving (very expensive in
the optical layer) resources associated with the LSP allocated and not
available for other LSPs;
b)	reroute via mb4b the LSP onto alternative path A-E-H-K-D –won’t work
for the same reason as in a)

what does not work ? the release of the unused resource so once notifiedof the failed controller and re-routing achieved trigger a PathErr withPSR flag and you are done (so its a policy configuration)

c)	leave LSP as it is and wait for the dead controller to be replaced or
repaired. This would mean the need to perform normal operations like, for
example, monitoring of data plane alarms, changing LSP admin status (for
example, disabling alarms on all nodes), perform power monitoring and
equalization, perform recovery operation in case of a fatal data plane
failure. All what depends on hop-by-hop signaling won’t work today.
Don’t tell me that these problems are fabricated; they are real because
they are raised by the customers. Dimitri seems to understand the problem

but he is saying that the CP in this case is hardly of any use.

just to clarify what i am saying here is that you can not expect acollaboration in the resiliency process from a controller that is downover a longer period than the repairing phase - now in any case the deadCP will need to be replaced at some point in time; hence, whatever theresiliency mechanism you are going to define it is certainly not goingto resurrect your dead CP and make possible the re-use of the node'sresources

hence, you should first correctly position the problem instead of tryingto find unadapted workarounds or missing functionality; moreover here,as the problem can be solved what can be additionally expected from theCP ?

This IMO
is a dangerous statement for the future of CP in non-packet environments.

what's more dangerous is to continuously claim that the GMPLS CP is notresilient because of incomplete policy configuration (and that more workis still needed)

The Management plane aficionados will jump on it and say that management
plane does not have such a problem – NMS has a direct access to any NE on
the network, so it can do all necessary cleanup no matter what happened.
Customers will say: “Well, if there are situations when CP suddenly
becomes useless and we have to use management plane anyway, why would we
use the CP in the first place?’

you should ask the question is the other way around - why do youconsider a CP like a distributed MP ?

Fortunately, I believe that the problems could be solved entirely via CP
by making it more resilient. Hence, CP resilience is a good direction to
work on within CCAMP WG

indeed, document the resiliency mechanisms that the CCAMP WG and othershave defined over time and a GMPLS CP can make use to address suchproblems may be advisable; but except if GMPLS is showing real gaps interms of resiliency there is no need to initiate new protocol work


thanks,
- dimitri.

Igor

If "Considerable period of time" is not equal to infinity, then there
will be an RSVP graceful restart.  If a controller is really and truly
dead, then presumably the operator will either replace it or re-assign
its data-plane resources to another signaling controller.  In either
case, there will then be an RSVP graceful restart.

Thanks,

John

-----Original Message-----
From: ibryskin@movaz.com [mailto:ibryskin@movaz.com]
Sent: Friday, October 28, 2005 1:00 PM
To: Drake, John E
Cc: ibryskin@movaz.com; dpapadimitriou@psg.com;
dimitri.papadimitriou@alcatel.be; Igor Bryskin; Zafar Ali; Kim Young


Hwa;

ccamp@ops.ietf.org
Subject: RE: Two Drafts for Resilience of Control Plane

John,

I think you missed my point here. "Dead" controllers in my example *do
not* come back for a considerable period of time. So there are no


restarts

here (graceful or not graceful) :=)

Igor

What part of your problem, as stated below, is not handled by RSVP
graceful restart?

-----Original Message-----
From: ibryskin@movaz.com [mailto:ibryskin@movaz.com]
Sent: Friday, October 28, 2005 11:41 AM
To: Drake, John E
Cc: dpapadimitriou@psg.com; dimitri.papadimitriou@alcatel.be; Igor
Bryskin; Zafar Ali; Kim Young Hwa; ccamp@ops.ietf.org
Subject: RE: Two Drafts for Resilience of Control Plane

Hi,

Here is one of the problems that I've been thinking for a while -


control

plane partitioned LSPs. Suppose one or more signaling controllers


managing

some LSP went out of service leaving the LSP's data plane intact.

As

far

as the user is concerned such LSP is perfectly healthy and


operational.

Such situation could last for a considerable period of time. Do we


need to

manage such LSP via control plane? Sure, we must be capable to tear


down

such LSP, perform mb4b rerouting, distribute alarms between


operational

controllers, signal data plane faults and perform recovery


switchover,

modify LSP status, etc. Can we do this today? No, but with some
(signaling) extensions the problem I believe is solvable. Is this


some

artificial, "fabricated" problem? No, I think it is real. Does it


fall

under the control plane resilience problem space? I believe it


does.

Igor

I agree with Zafar and Dimitri.  If someone wanted to document

the

GMPLS

control plane resiliency features, as was done for GMPLS


addressing,

that might be a useful activity.

-----Original Message-----
From: dimitri papadimitriou [mailto:dpapadimitriou@psg.com]
Sent: Friday, October 28, 2005 9:56 AM
To: Igor Bryskin
Cc: Zafar Ali (zali); Kim Young Hwa; ccamp@ops.ietf.org
Subject: Re: Two Drafts for Resilience of Control Plane

igor -

over time CCAMP came with a set of mechanims to improve control


plane

resilience (RSVP and LMP GR upon channel/node failure) other WG


protocol

work are also usable used here OSPF GR, etc. ... on the other


side,

mechanism such as link bundling have built-in resilience


capabilities

and most GMPLS control plane capabilities have been designed


such

as

to

be independent of the control plane realisation (in-band,


out-of-band,

etc.)

so indeed i share the concern of Zafar what could we do more


here

than

document these tools and provide our experience in using them;

now, before stating there are (potential) problems(s) arising -


would

you please be more specific on what are these potential issue(s)


and/or

problems ? (not related to policy/config. - note: all the issues

you

have pointed here below are simply policy/config specific but


none

of

them highlights a missing IP control plane resiliency feature)

thanks,
- dimitri.


Igor Bryskin wrote:

Zafar,

The problem arises when the control plane is decoupled
from the data plane. The question is do we need such
decoupling in IP networks? Consider, for example, the
situation when several parallel PSC data links bundled
together and controlled by a single control channel.
Does it mean in this case that when the control
channel fails all associated data links also fail? Do
we need to reroute in this case LSPs that use the data
links? Can we rely in this case on control plane
indications to decide whether an associated data link
is healthy or not (in other words, can we rely on RSVP
Hellos or should we use, for example, BTD)? Should we
be capable to recover control channels without
disturbing data plane? I think control plane
resilience is important for all layers. You are right,
Internet does work, however, we do need for some
reason TE and (fast) recovery in IP as much as in
other layers,don't we?

Cheers,
Igor

--- "Zafar Ali (zali)" <zali@cisco.com> wrote:

Hi All,

I am unable to understand the problem we are trying
to solve or
fabricate. My control network is IP based and IP has
proven resiliency
(Internet *does* work), why would I like to take
control plan resiliency
problem at a layer *above-IP* and complicate my
life. Did I miss
something?

Thanks

Regards... Zafar


________________________________

	From: owner-ccamp@ops.ietf.org
[mailto:owner-ccamp@ops.ietf.org]
On Behalf Of Kim Young Hwa
	Sent: Friday, October 28, 2005 6:04 AM
	To: ccamp@ops.ietf.org
	Subject: Two Drafts for Resilience of Control Plane


	Dear all,

	I posted two drafts for the resilience of control
plane.
	One is for requirements of the resilience of
control plane, the
other is for a protocol specification as a solution
of that .
	These are now available at:

http://www.ietf.org/internet-drafts/draft-kim-ccamp-cpr-reqts-01.txt

http://www.ietf.org/internet-drafts/draft-kim-ccamp-accp-protocol-00.txt

	I want your comments.

	Regards

	Young.

	===================================> >>	Young-Hwa Kim
	Principal Member / Ph.D
	BcN Research Division, ETRI
	Tel:     +82-42-860-5819
	Fax:    +82-42-860-5440
	e-mail: yhwkim@etri.re.kr
	===================================> >>

<http://umail.etri.re.kr/External_ReadCheck.aspx?email=ccamp@ops.ietf.or

g&name=ccamp%40ops.ietf.org&fromemail=yhwkim@etri.re.kr&messageid=%3C863

0a6db-0c31-49ab-a798-13b0dda04553@etri.re.kr%3E>






__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com


.

References:
- RE: Two Drafts for Resilience of Control Plane
  - From: "Drake, John E" <John.E.Drake2@boeing.com>
- RE: Two Drafts for Resilience of Control Plane
  - From: ibryskin@movaz.com

Prev by Date: RE: Two Drafts for Resilience of Control Plane
Next by Date: RE: Two Drafts for Resilience of Control Plane
Previous by thread: RE: Two Drafts for Resilience of Control Plane
Next by thread: RE: Two Drafts for Resilience of Control Plane
Index(es):
- Date
- Thread