[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Two Drafts for Resilience of Control Plane



igor - see in-line
Igor,

What you wrote was:

"Suppose one or more signaling controllers managing some LSP went out of
service leaving the LSP's data plane intact. As far as the user is
concerned such LSP is perfectly healthy and operational.  Such situation
could last for a considerable period of time."

What part of this is *not* handled by RSVP graceful restart?

In your subsequent e-mail, you then changed the problem statement to:

""Dead" controllers in my example *do not* come back for a considerable
period of time. So there are no restarts here (graceful or not
graceful)"


Sorry, I don’t see how I have changed the problem statement. I was and am
saying that while controllers are out of service for a considerable time
(day? two days?  week?) the question is what to do with active LSPs
associated with them?

you should take this as the initial problem - is there something wrong with that situation as transient ? and is the objective leave B dead for an non-finite period of time because in *any case* you will have to replace B to be able to provision new LSPs crossing this node ?

Let’s consider an example:


A----B------C-----D
}                 |
E-----F-----H-----K

Suppose we have an LSP A-B-C-D carrying user traffic and a controller
managing node B went out of service. The question is what to do with this
LSP until the controller comes back? The operator may decide:
a)	simply not wait and delete the LSP. Normal LSP teardown – PathTear
originated on the ingress controller- won’t work because PathTear won’t
make it to controllers managing nodes C and D, leaving (very expensive in
the optical layer) resources associated with the LSP allocated and not
available for other LSPs;
b)	reroute via mb4b the LSP onto alternative path A-E-H-K-D –won’t work
for the same reason as in a)

what does not work ? the release of the unused resource so once notified of the failed controller and re-routing achieved trigger a PathErr with PSR flag and you are done (so its a policy configuration)

c)	leave LSP as it is and wait for the dead controller to be replaced or
repaired. This would mean the need to perform normal operations like, for
example, monitoring of data plane alarms, changing LSP admin status (for
example, disabling alarms on all nodes), perform power monitoring and
equalization, perform recovery operation in case of a fatal data plane
failure. All what depends on hop-by-hop signaling won’t work today.
Don’t tell me that these problems are fabricated; they are real because
they are raised by the customers. Dimitri seems to understand the problem
but he is saying that the CP in this case is hardly of any use.

just to clarify what i am saying here is that you can not expect a collaboration in the resiliency process from a controller that is down over a longer period than the repairing phase - now in any case the dead CP will need to be replaced at some point in time; hence, whatever the resiliency mechanism you are going to define it is certainly not going to resurrect your dead CP and make possible the re-use of the node's resources

hence, you should first correctly position the problem instead of trying to find unadapted workarounds or missing functionality; moreover here, as the problem can be solved what can be additionally expected from the CP ?

This IMO
is a dangerous statement for the future of CP in non-packet environments.

what's more dangerous is to continuously claim that the GMPLS CP is not resilient because of incomplete policy configuration (and that more work is still needed)

The Management plane aficionados will jump on it and say that management
plane does not have such a problem – NMS has a direct access to any NE on
the network, so it can do all necessary cleanup no matter what happened.
Customers will say: “Well, if there are situations when CP suddenly
becomes useless and we have to use management plane anyway, why would we
use the CP in the first place?’

you should ask the question is the other way around - why do you consider a CP like a distributed MP ?

Fortunately, I believe that the problems could be solved entirely via CP
by making it more resilient. Hence, CP resilience is a good direction to
work on within CCAMP WG

indeed, document the resiliency mechanisms that the CCAMP WG and others have defined over time and a GMPLS CP can make use to address such problems may be advisable; but except if GMPLS is showing real gaps in terms of resiliency there is no need to initiate new protocol work

thanks,
- dimitri.

Igor


If "Considerable period of time" is not equal to infinity, then there
will be an RSVP graceful restart.  If a controller is really and truly
dead, then presumably the operator will either replace it or re-assign
its data-plane resources to another signaling controller.  In either
case, there will then be an RSVP graceful restart.

Thanks,

John




-----Original Message-----
From: ibryskin@movaz.com [mailto:ibryskin@movaz.com]
Sent: Friday, October 28, 2005 1:00 PM
To: Drake, John E
Cc: ibryskin@movaz.com; dpapadimitriou@psg.com;
dimitri.papadimitriou@alcatel.be; Igor Bryskin; Zafar Ali; Kim Young

Hwa;

ccamp@ops.ietf.org
Subject: RE: Two Drafts for Resilience of Control Plane

John,

I think you missed my point here. "Dead" controllers in my example *do
not* come back for a considerable period of time. So there are no

restarts

here (graceful or not graceful) :=)

Igor


What part of your problem, as stated below, is not handled by RSVP
graceful restart?


-----Original Message-----
From: ibryskin@movaz.com [mailto:ibryskin@movaz.com]
Sent: Friday, October 28, 2005 11:41 AM
To: Drake, John E
Cc: dpapadimitriou@psg.com; dimitri.papadimitriou@alcatel.be; Igor
Bryskin; Zafar Ali; Kim Young Hwa; ccamp@ops.ietf.org
Subject: RE: Two Drafts for Resilience of Control Plane

Hi,

Here is one of the problems that I've been thinking for a while -

control

plane partitioned LSPs. Suppose one or more signaling controllers

managing

some LSP went out of service leaving the LSP's data plane intact.

As

far

as the user is concerned such LSP is perfectly healthy and

operational.

Such situation could last for a considerable period of time. Do we

need to

manage such LSP via control plane? Sure, we must be capable to tear

down

such LSP, perform mb4b rerouting, distribute alarms between

operational

controllers, signal data plane faults and perform recovery

switchover,

modify LSP status, etc. Can we do this today? No, but with some
(signaling) extensions the problem I believe is solvable. Is this

some

artificial, "fabricated" problem? No, I think it is real. Does it

fall

under the control plane resilience problem space? I believe it

does.

Igor


I agree with Zafar and Dimitri.  If someone wanted to document

the

GMPLS

control plane resiliency features, as was done for GMPLS

addressing,

that might be a useful activity.


-----Original Message-----
From: dimitri papadimitriou [mailto:dpapadimitriou@psg.com]
Sent: Friday, October 28, 2005 9:56 AM
To: Igor Bryskin
Cc: Zafar Ali (zali); Kim Young Hwa; ccamp@ops.ietf.org
Subject: Re: Two Drafts for Resilience of Control Plane

igor -

over time CCAMP came with a set of mechanims to improve control

plane

resilience (RSVP and LMP GR upon channel/node failure) other WG

protocol

work are also usable used here OSPF GR, etc. ... on the other

side,

mechanism such as link bundling have built-in resilience

capabilities

and most GMPLS control plane capabilities have been designed

such

as

to

be independent of the control plane realisation (in-band,

out-of-band,

etc.)

so indeed i share the concern of Zafar what could we do more

here

than

document these tools and provide our experience in using them;

now, before stating there are (potential) problems(s) arising -

would

you please be more specific on what are these potential issue(s)

and/or

problems ? (not related to policy/config. - note: all the issues

you

have pointed here below are simply policy/config specific but

none

of

them highlights a missing IP control plane resiliency feature)

thanks,
- dimitri.


Igor Bryskin wrote:


Zafar,

The problem arises when the control plane is decoupled
from the data plane. The question is do we need such
decoupling in IP networks? Consider, for example, the
situation when several parallel PSC data links bundled
together and controlled by a single control channel.
Does it mean in this case that when the control
channel fails all associated data links also fail? Do
we need to reroute in this case LSPs that use the data
links? Can we rely in this case on control plane
indications to decide whether an associated data link
is healthy or not (in other words, can we rely on RSVP
Hellos or should we use, for example, BTD)? Should we
be capable to recover control channels without
disturbing data plane? I think control plane
resilience is important for all layers. You are right,
Internet does work, however, we do need for some
reason TE and (fast) recovery in IP as much as in
other layers,don't we?

Cheers,
Igor

--- "Zafar Ali (zali)" <zali@cisco.com> wrote:



Hi All,

I am unable to understand the problem we are trying
to solve or
fabricate. My control network is IP based and IP has
proven resiliency
(Internet *does* work), why would I like to take
control plan resiliency
problem at a layer *above-IP* and complicate my
life. Did I miss
something?

Thanks

Regards... Zafar


________________________________

	From: owner-ccamp@ops.ietf.org
[mailto:owner-ccamp@ops.ietf.org]
On Behalf Of Kim Young Hwa
	Sent: Friday, October 28, 2005 6:04 AM
	To: ccamp@ops.ietf.org
	Subject: Two Drafts for Resilience of Control Plane


	Dear all,

	I posted two drafts for the resilience of control
plane.
	One is for requirements of the resilience of
control plane, the
other is for a protocol specification as a solution
of that .
	These are now available at:




http://www.ietf.org/internet-drafts/draft-kim-ccamp-cpr-reqts-01.txt



http://www.ietf.org/internet-drafts/draft-kim-ccamp-accp-protocol-00.txt

	I want your comments.

	Regards

	Young.

	===================================> >>	Young-Hwa Kim
	Principal Member / Ph.D
	BcN Research Division, ETRI
	Tel:     +82-42-860-5819
	Fax:    +82-42-860-5440
	e-mail: yhwkim@etri.re.kr
	===================================> >>


<http://umail.etri.re.kr/External_ReadCheck.aspx?email=ccamp@ops.ietf.or


g&name=ccamp%40ops.ietf.org&fromemail=yhwkim@etri.re.kr&messageid=%3C863

0a6db-0c31-49ab-a798-13b0dda04553@etri.re.kr%3E>







__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com


.








.