[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Two Drafts for Resilience of Control Plane

To: Igor Bryskin <ibryskin@movaz.com>
Subject: Re: Two Drafts for Resilience of Control Plane
From: dimitri papadimitriou <dpapadimitriou@psg.com>
Date: Mon, 31 Oct 2005 18:40:56 +0100
Cc: dimitri.papadimitriou@alcatel.be, "Drake, John E" <John.E.Drake2@boeing.com>, "Zafar Ali (zali)" <zali@cisco.com>, Igor Bryskin <i_bryskin@yahoo.com>, drake@movaz.com, Kim Young Hwa <yhwkim@etri.re.kr>, ccamp@ops.ietf.org
In-reply-to: <015b01c5de3e$f2a2a090$7a1810ac@movaz.com>
References: <626FC7C6A97381468FB872072AB5DDC8369714@XCH-SW-42.sw.nos.boeing.com> <00b201c5de24$c20fc710$7a1810ac@movaz.com> <436639BA.6040405@psg.com> <011a01c5de33$4fc053f0$7a1810ac@movaz.com> <43664BBF.8030704@psg.com> <015b01c5de3e$f2a2a090$7a1810ac@movaz.com>
Reply-to: dpapadimitriou@psg.com, dimitri.papadimitriou@alcatel.be
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.11) Gecko/20050728

igor

Suppose you have LSP going through  -A-B-C- and a controller managing node B
fails, while the data plane is intact, that is, no data plane alarms are
detected.

According to your logic such situation can exist only for 90c, because once
controller managing node C detects the absence of 3 Path refreshes it will
delete the control plane and destroy the service. My question is why do you
need the RSVP graceful restart procedures then if they can happen only
within 90c time interval - after that there will be nothing to synchronize.

What we see and hear in the field, though, that a controller may stay days
out of service and then come back and should be capable to synchronize the
control state of all LSPs it used to manage before the crash/reboot. And, of
course, you MUST maintain data service up and running.

i have never said the contrary, if you run an (independent) cp failuredetection mechanism and prior negotiated cp recovery: in case of failurein CP => GR (see RFC 3471/3) ... you are the one trying to find protocoldeficiencies while it is still totally unclear to me (but i am not theonly one apparently) where they these limitations (potentially) are ?

on one side, you ask how to free costly resources and on the other youask how to maintain the connection service ... at the end, you shoulddecide what you want to do - i am not sure that you have understood thatyou can have both at the same time

Igor

----- Original Message -----From: "dimitri papadimitriou" <dpapadimitriou@psg.com>

To: "Igor Bryskin" <ibryskin@movaz.com>
Cc: <dimitri.papadimitriou@alcatel.be>; "Drake, John E"
<John.E.Drake2@boeing.com>; "Zafar Ali (zali)" <zali@cisco.com>; "Igor
Bryskin" <i_bryskin@yahoo.com>; <drake@movaz.com>; "Kim Young Hwa"
<yhwkim@etri.re.kr>; <ccamp@ops.ietf.org>
Sent: Monday, October 31, 2005 11:52 AM
Subject: Re: Two Drafts for Resilience of Control Plane

igor -

Igor Bryskin wrote:

Dimitri,

igor - my two cents

RSVP over time has progressively borrowed mechanisms from "hard-state"
protocols, explicit deletion using PathTear is most noticeable and
initial example of this evolution !

but in any case, RSVP still relies is on idem-potent soft-states that
are flushed when not refreshed after certain time interval (or self-
maintained if previously negotiated) this prevents orphans in the
network (so unused resources) and provides for resilience - hence there
is by no means a need to introduce an additional protocol mechanism to
trigger or not such event via the "control plane" -



Refreshes are useful mechanism but only between neighbors that maintain
Hello communication. In this case the absence of Path refreshes is as


good

indication that data plane must be destroyed as received PathTear


message.

the base function of state refresh and usefulness is independent of
hello adjacency maintenance (or any other control channel maintenance)

However, when a controller does not receive Path refreshes from a


neighbor

it does not have any control plane communication with, it can assume


neither

a problem in the data plane nor intention to destroy it.


as the node did not negotiate any channel/node fault recovery (due in
part. to the absence of Hello adjacency with its neighbor) and if no
other independent control channel failure is provided (this is an add-on
of RFC3471/3), the simple absence of refresh is simply intepreted as
"implicit deletion"

you are mis-interpreting the following sentence of RFC3471

"   Note that these cases only apply when there are mechanisms to detect
   data channel failures independent of control channel failures."

there is no retro-fit on the use of Refreshes in absence of control
channel failure detection mechanism

Hence, as it was
specified in RFC3471, it *must* maintain both control and data plane


states

throughout the failure.


- d.

Igor

btw, the paragraph you mention in RFC3471 does not say "soft state
protocols do not work well for non-packet environments" this is your
interpretation;

ps: you are still free to make use of RFC3472 in case (as you were
apparently looking for something else ;-)

Igor Bryskin wrote:

John,

States are supposed to be destroyed on explicit signalling
message (e.g. PathTear or PathErr with the state removal
flag), but not because of the absence of refreshes.


[JD]

Igor,

Just to be clear, we are talking about RSVP here, and RSVP *is* a soft
state protocol.  Can you point to any RFC that supports your statements
above?

IB>> Oh, come on, John. You sound like you've been yourself in a


dormant

state for a while :=). We've gone a long way since RFC2205. In RFC3471,

for

example, there is a discussion why GMPLS is needed and how is it


different

from MPLS. One of the differences is the fact that soft state protocols

do

not work well for non-packet environments. Here is from the RDC3471:



9.2. Fault Handling    There are two new faults that must be handled


when

the control   channel is independent of the data channel.  In the


first,

there is a   link or other type of failure that limits the ability of
neighboring   nodes to pass control messages.  In this situation,
neighboring nodes   are unable to exchange control messages for a


period

of

time.  Once   communication is restored the underlying signaling


protocol

must   indicate that the nodes have maintained their state through the
failure..

What is more important is the reality of life: The customers simply say


that

you cannot destroy a user service (or even force any traffic hits) just
because you have a problem in the control plane. If this does not fit


your

soft-state paradigm, than "harden" your protocols or flash them down

the

toilet and come with something else if you want our business. After


all,

if

we provision the services via NMS, we do not have to destroy the


services if

we have problems in the management network. It is that simple.



Igor







.

Follow-Ups:
- Re: Two Drafts for Resilience of Control Plane
  - From: "Igor Bryskin" <ibryskin@movaz.com>

References:
- RE: Two Drafts for Resilience of Control Plane
  - From: "Drake, John E" <John.E.Drake2@boeing.com>
- Re: Two Drafts for Resilience of Control Plane
  - From: "Igor Bryskin" <ibryskin@movaz.com>
- Re: Two Drafts for Resilience of Control Plane
  - From: dimitri papadimitriou <dpapadimitriou@psg.com>
- Re: Two Drafts for Resilience of Control Plane
  - From: "Igor Bryskin" <ibryskin@movaz.com>
- Re: Two Drafts for Resilience of Control Plane
  - From: dimitri papadimitriou <dpapadimitriou@psg.com>
- Re: Two Drafts for Resilience of Control Plane
  - From: "Igor Bryskin" <ibryskin@movaz.com>

Prev by Date: RE: comments on draft-shiba-ccamp-gmpls-lambda-labels-00.txt
Next by Date: RE: comments on draft-shiba-ccamp-gmpls-lambda-labels-00.txt
Previous by thread: Re: Two Drafts for Resilience of Control Plane
Next by thread: Re: Two Drafts for Resilience of Control Plane
Index(es):
- Date
- Thread