David,
This is a very interesting/thoughtful mail. When I read it I was
immediately struck by recent discussions some of us have been having on
the topic of unified functional modelling within SG15, ie the underlying
axiomatic truths that stem directly from Shannon/Turing wrt networking
and that show how the co-cs, co-ps and cl-ps networking modes all have a
common root. However, whilst I don't want to get into that here I do
want to illustrate a key consequence of this work that is particularly
relevant to your observations.
In the co modes we have this rather important object called a trail. A
trail is, in essence, a concatenated sequence of link-connections and
nodes (which are the smallest subnetwork connections). However, the key
observation is that at the epoch we create the trail we remove all
further routing choices from the trail. You can almost think of this as
effectively removing the trail from its parent network topology (there
is a resource accounting issue to deal with, but that is a different
point).....and I guess in IETF speak one would call this
'route-pinning'.
What this means is that it does not matter what happens to the routing
process after this epoch, once the trail is established there is no need
to move it. And a key example of why this is important is the case of
augmenting the network topology by adding further nodes/links. Because
the network topology has now changed then future routing decisions are
very likely to alter. But in the co mode case this does not
matter....once a trail is created it remains faithful to its original
routing choice unless we consciously decide to move it (which may of
course be the result of failures or planned works).
Now contrast this with the cl mode case. If we augment a cl network by
adding new nodes/links then the routing process will update and traffic
will move in accordance with whatever is the optimal path.
Note a trail has other key properties, eg it allows one to completely
decouple a survivability semantic from the QoS semantics of client
traffic units. This property is rather important when one offering a
network builder service to customers.
regards, Neil
-----Original Message-----
From: owner-ccamp@ops.ietf.org
[mailto:owner-ccamp@ops.ietf.org] On Behalf Of David Charlap
Sent: 28 September 2005 19:37
To: IETF MPLS List; IETF CCAMP List
Subject: Graceful restart - inter-protocol dependencies
Is there any point to implementing RSVP-TE's graceful restart without
also implementing graceful restart for routing protocols?
On the one hand, RSVP doesn't require routing to recover
LSPs. It knows
the next-hop interface, because of the preserved data-plane
connection.
Whatever other information it may need, the switch can
either preserve
this information or recover it from a neighbor using the RecoveryPath
object (draft-ietf-ccamp-rsvo-restart-ext-03).
On the other hand, nodes more than one hop upstream of the
failure will
detect the loss of routing-connectivity to the failed node if IGP
graceful restart is not also implemented. They may reroute
the LSP away
from the failed node, or tear it down altogether, even though
the data
plane is still active and RSVP graceful restart is recovering the
control-plane state.
An originating node may consider the destination unreachable, as a
result of losing the routes even though the data-plane for the LSP is
still up (which can be confirmed via OAM.)
A transit node, when processing an loose ERO-hop, may choose
to reroute
or fail the LSP if its local topology information says that
the failed
(and restarting) node is not available. It might even choose
to do this
for an established LSP, as a result of Path refresh processing.
My questions are:
1: Does this mean it's pointless to use RSVP graceful restart without
also using IGP graceful restart (for whatever IGP is active).
2: Is IGP graceful restart sufficient to prevent this problem? For
instance, OSPF's restart procedure requires all preserved state to
be thrown away if a topology change is detected.
3: An originating node can use OAM to validate the data plane of an
LSP, and choose to ignore what routing tells it about the
destination's reachability. But what about transit nodes? As far
as I know, MPLS doesn't support segment-OAM, and it would be
prohibitive for every transit node to run its own OAM streams
to detect self-to-end connectivity.
4: Are there other solutions to this problem?
One possible solution might be route-pinning, but RSVP doesn't have a
built-in mechanism for this. The usual workaround (signal the LSP,
requesting route-recording, then turn the RRO into an ERO for
subsequent
refreshes) can work, but are there situations where even this
would be
insufficient to prevent a transit node from rerouting/tearing the
connection in this particular situation (where RSVP is doing
a graceful
restart but the IGP is not)?
-- David