[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Two Drafts for Resilience of Control Plane
Dimitri,
Let me try once more.
> igor
>
> > Suppose you have LSP going through -A-B-C- and a controller managing
node B
> > fails, while the data plane is intact, that is, no data plane alarms are
> > detected.
> >
> > According to your logic such situation can exist only for 90c, because
once
> > controller managing node C detects the absence of 3 Path refreshes it
will
> > delete the control plane and destroy the service. My question is why do
you
> > need the RSVP graceful restart procedures then if they can happen only
> > within 90c time interval - after that there will be nothing to
synchronize.
> >
> > What we see and hear in the field, though, that a controller may stay
days
> > out of service and then come back and should be capable to synchronize
the
> > control state of all LSPs it used to manage before the crash/reboot.
And, of
> > course, you MUST maintain data service up and running.
>
> i have never said the contrary, if you run an (independent) cp failure
> detection mechanism and prior negotiated cp recovery: in case of failure
> in CP => GR (see RFC 3471/3) ... you are the one trying to find protocol
> deficiencies while it is still totally unclear to me (but i am not the
> only one apparently) where they these limitations (potentially) are ?
>
> on one side, you ask how to free costly resources and on the other you
> ask how to maintain the connection service ...
IB>>Do you believe the two requirements contradict each other? I don't think
so
> on one side, you ask how to free costly resources
==>only if the user decides to do so
> and on the other you
> ask how to maintain the connection service ...
==>only if the user decides to do so
at the end, you should
> decide what you want to do - i am not sure that you have understood that
> you can have both at the same time
IB>>
1. Yes, I do have an independent dp (not cp as you say) failure detection
mechanism
2. Yes, I do negotiate with my neighbors not to drop state in case of cp
failure
3. Yes, I want to control my LSP in such state from a *single* (LSP ingress)
point, because when an LSP is created by a user (via say UNI or management
plane) it is totally up to user to decide what to do with the LSP when CP
fails. The only way the user can control this is by sending another
request(s) to the LSP ingress controller (not to egress or any transit
controller because the user may not even know their IDs). In particular, the
user may decide to continue using the service (after all it is still
carrying its traffic), but in this case the user would want to maintain
control of the LSP (for example, he may want to get notifications about data
plane alarms detected on the nodes beyond point of CP failure, likewise, he
may want to modify the LSP's admin status, setup/holding priorities, etc.)
Alternatively, he may decide to delete or reroute it. Still in any case the
user wants to control/convey its decision only to the LSP ingress
controller.
4. No. I can not do this today because of the deficiency of RSVP - messages
on a contiguous LSP are sent on a hop-by-hop basis, meaning that any hop can
block the message
5. Yes, I do have a simple and totally backward compatible solution how to
overcome these deficiencies.
Igor
>
> > Igor
> >
> > ----- Original Message -----
> > From: "dimitri papadimitriou" <dpapadimitriou@psg.com>
> > To: "Igor Bryskin" <ibryskin@movaz.com>
> > Cc: <dimitri.papadimitriou@alcatel.be>; "Drake, John E"
> > <John.E.Drake2@boeing.com>; "Zafar Ali (zali)" <zali@cisco.com>; "Igor
> > Bryskin" <i_bryskin@yahoo.com>; <drake@movaz.com>; "Kim Young Hwa"
> > <yhwkim@etri.re.kr>; <ccamp@ops.ietf.org>
> > Sent: Monday, October 31, 2005 11:52 AM
> > Subject: Re: Two Drafts for Resilience of Control Plane
> >
> >
> >
> >>igor -
> >>
> >>Igor Bryskin wrote:
> >>
> >>>Dimitri,
> >>>
> >>>
> >>>>igor - my two cents
> >>>>
> >>>>RSVP over time has progressively borrowed mechanisms from "hard-state"
> >>>>protocols, explicit deletion using PathTear is most noticeable and
> >>>>initial example of this evolution !
> >>>>
> >>>>but in any case, RSVP still relies is on idem-potent soft-states that
> >>>>are flushed when not refreshed after certain time interval (or self-
> >>>>maintained if previously negotiated) this prevents orphans in the
> >>>>network (so unused resources) and provides for resilience - hence
there
> >>>>is by no means a need to introduce an additional protocol mechanism to
> >>>>trigger or not such event via the "control plane" -
> >>>
> >>>
> >>>Refreshes are useful mechanism but only between neighbors that maintain
> >>>Hello communication. In this case the absence of Path refreshes is as
> >
> > good
> >
> >>>indication that data plane must be destroyed as received PathTear
> >
> > message.
> >
> >>the base function of state refresh and usefulness is independent of
> >>hello adjacency maintenance (or any other control channel maintenance)
> >>
> >>
> >>>However, when a controller does not receive Path refreshes from a
> >
> > neighbor
> >
> >>>it does not have any control plane communication with, it can assume
> >
> > neither
> >
> >>>a problem in the data plane nor intention to destroy it.
> >>
> >>as the node did not negotiate any channel/node fault recovery (due in
> >>part. to the absence of Hello adjacency with its neighbor) and if no
> >>other independent control channel failure is provided (this is an add-on
> >>of RFC3471/3), the simple absence of refresh is simply intepreted as
> >>"implicit deletion"
> >>
> >>you are mis-interpreting the following sentence of RFC3471
> >>
> >>" Note that these cases only apply when there are mechanisms to detect
> >> data channel failures independent of control channel failures."
> >>
> >>there is no retro-fit on the use of Refreshes in absence of control
> >>channel failure detection mechanism
> >>
> >>
> >>>Hence, as it was
> >>>specified in RFC3471, it *must* maintain both control and data plane
> >
> > states
> >
> >>>throughout the failure.
> >>
> >>- d.
> >>
> >>>Igor
> >>>
> >>>
> >>>
> >>>>btw, the paragraph you mention in RFC3471 does not say "soft state
> >>>>protocols do not work well for non-packet environments" this is your
> >>>>interpretation;
> >>>>
> >>>>ps: you are still free to make use of RFC3472 in case (as you were
> >>>>apparently looking for something else ;-)
> >>>>
> >>>>Igor Bryskin wrote:
> >>>>
> >>>>
> >>>>>John,
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>States are supposed to be destroyed on explicit signalling
> >>>>>>>message (e.g. PathTear or PathErr with the state removal
> >>>>>>>flag), but not because of the absence of refreshes.
> >>>>>>>
> >>>>>
> >>>>>[JD]
> >>>>>
> >>>>>Igor,
> >>>>>
> >>>>>Just to be clear, we are talking about RSVP here, and RSVP *is* a
soft
> >>>>>state protocol. Can you point to any RFC that supports your
statements
> >>>>>above?
> >>>>>
> >>>>>IB>> Oh, come on, John. You sound like you've been yourself in a
> >
> > dormant
> >
> >>>>>state for a while :=). We've gone a long way since RFC2205. In
RFC3471,
> >>>
> >>>for
> >>>
> >>>
> >>>>>example, there is a discussion why GMPLS is needed and how is it
> >>>
> >>>different
> >>>
> >>>
> >>>>>from MPLS. One of the differences is the fact that soft state
protocols
> >>>
> >>>do
> >>>
> >>>
> >>>>>not work well for non-packet environments. Here is from the RDC3471:
> >>>>>
> >>>>>
> >>>>>
> >>>>>9.2. Fault Handling There are two new faults that must be handled
> >>>
> >>>when
> >>>
> >>>
> >>>>>the control channel is independent of the data channel. In the
> >
> > first,
> >
> >>>>>there is a link or other type of failure that limits the ability of
> >>>>>neighboring nodes to pass control messages. In this situation,
> >>>>>neighboring nodes are unable to exchange control messages for a
> >
> > period
> >
> >>>of
> >>>
> >>>
> >>>>>time. Once communication is restored the underlying signaling
> >>>
> >>>protocol
> >>>
> >>>
> >>>>>must indicate that the nodes have maintained their state through
the
> >>>>>failure..
> >>>>>
> >>>>>What is more important is the reality of life: The customers simply
say
> >>>
> >>>that
> >>>
> >>>
> >>>>>you cannot destroy a user service (or even force any traffic hits)
just
> >>>>>because you have a problem in the control plane. If this does not fit
> >>>
> >>>your
> >>>
> >>>
> >>>>>soft-state paradigm, than "harden" your protocols or flash them down
> >
> > the
> >
> >>>>>toilet and come with something else if you want our business. After
> >
> > all,
> >
> >>>if
> >>>
> >>>
> >>>>>we provision the services via NMS, we do not have to destroy the
> >>>
> >>>services if
> >>>
> >>>
> >>>>>we have problems in the management network. It is that simple.
> >>>>>
> >>>>>
> >>>>>
> >>>>>Igor
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>.
> >>>>>
> >>>
> >>>
> >>>
> >>>
> >>>.
> >>>
> >
> >
> >
> >
> > .
> >