[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RSVP Restart (was Re: update GMPLS signaling documents)



Yakov,

Thanks for the clarification.  My comments are inline:

Yakov Rekhter wrote:

>Gopal,
>
>>>>I have some questions/concerns on the restart mechanism as
>>>>specified in draft-ietf-mpls-generalized-rsvp-te-04.txt.  I
>>>>have listed them below.
>>>>
>>>>If a node that is terminating hierarchical LSPs restarts,
>>>>there is an ordering issue during resynchronization, since
>>>>the LSPs would depend on other FA-LSPs the interface IDs for
>>>>which would have been generated dynamically. So the
>>>>mechanism as described will not work, unless this
>>>>information is preserved across restarts - in which case we
>>>>might as well preserve other information as well, and avoid
>>>>resynchronization.
>>>>
>>>Since transit node of the FA-LSP does not see the RSVP
>>>messages that use the FA, we only need to consider
>>>the ingress and egress node. If the egress node restarts,
>>>the worst case is that its upstream node sends an ERO
>>>consisting a unrecognized interface address. Since the
>>>Path message will carry RECOVERY_LABEL (replacing
>>>SUGGESTED_LABEL), the egress node knows that it might
>>>not have complete information yet, and can hold on to
>>>the Path message until the FA-LSP is established.
>>>All the information is there, so implementation
>>>can be completely within the egress node. It does work,
>>>just not specified enough.
>>>
>>>Ingress node will have to know the dependency, so not
>>>a problem there.
>>>
>>Are you suggesting that in case of an ERO with an
>>unrecognized interface id, if the Path message also
>>carries a RECOVERY LABEL, the egress node can hold on
>>to it until later?  Note that there may not be a
>>RECOVERY LABEL - since this Path message is for a
>>hierarchical LSP, and the previous node may not be
>>directly connected in the control plane.  Here is an
>>example - There is an LSP L1 from C to F, and on top of
>>it there is an LSP L2 from A to G.
>>
>> L1:              C --- D --- E --- F
>> L2:  A --- B --- C --------------- F --- G
>>
>>When F restarts, C may not even know, unless HELLO
>>protocol is run between C and F.  Note that the
>>HELLO mechanism is intended for immediate neighbors.
>>
>>And even if the HELLO mechanism is extended, there
>>is the issue of interface id assigned by F.  F needs
>>to assign the same interface id upon restart so as
>>not to bring down the hierarchical LSPs set up on top
>>of L1 in 
>>the reverse direction (say an LSP from G to
>>B set up on top of L1.)  Where would F get this
>>information ?
>>
>
>Couple of points:
>
>(1) First of all, you do run RSVP Hello between C and F.
>

O.K.

>
>
>(2) Since L1 is advertised as an FA into OSPF/ISIS, F should
>be able to recover the Interface ID it assigns to L1 from
>a combination of (a) the OSPF/ISIS link state database that
>F would recover, and (b) the Forward Interface ID (the one
>assigned by C).
>
True. One could either use IGP restart mechanism to relearn
this (I have other concerns on RSVP restart being dependent on
IGP restart, but that for later) or preserve this across restarts.
In either case,  you agree that RSVP restart depends on some
mechanism outside RSVP to help it along?  So RSVP needs a new
interface to get this mapping upon restarts, right? This is not
clear in the draft.


>
>
>>>>Also, let us consider a node that can preserve state across
>>>>restarts, and hence does not need its state to be synced by
>>>>its peer.  How will it advertise this in the RESTART_CAP
>>>>object?
>>>>
>>>We still need to resynchronize the state since the other
>>>side might have connections that are in progress. The
>>>
>>Let us say the node can resynchronize its neighbor if
>>the neighbor restarts and requests state recovery. But
>>the issue is how a node can advertise that it does not
>>need recovery since all its state was preserved?
>>
>
>By treating is the same way as the way the spec handles
>control channel fault.
>

True, but this does not help.  See below.


>
>
>>>reason for resynchronization is to make sure both side
>>>have done the same to the in-progress connections, not
>>>just to synchronize the established states.
>>>
>>>The value of the RESTART_CAP will set to non-zero values
>>>
>>RECOVERY LABEL does not come into picture unless the node
>>that is upstream to the restarting node has already received
>>a Resv.
>>
>
>Wrong. Quoting 9.5.3:
>
>   Upon detecting a restart with a neighbor that supports state
>   recovery, a node SHOULD refresh all Path state shared with that
>   neighbor.
>
>So, as you can hopefully see from the above, the upstream node doesn't 
>wait until it receives a Resv.
>
True - if you had read further, you would have noticed that
I have said the same thing that you have quoted :) -
but there will be no RECOVERY_LABEL unless a Resv has
been received, right?

It is possible that the downstream node, after programming
its forwarding path, restarted before sending the Resv.
This would result in the upstream node refreshing the Path
message without the Recovery Label, and the downstream
node eventually allocating a different label.  So essentially
it is a new LSP setup.

>
>
>>So it seems that the procedure is to resynchronize
>>established state? If Resv has not been received yet, there
>>will be a refresh of Path message, and the restarting node
>>will consider it as a new request, etc.  So can you elaborate
>>on "to make sure both side have done the same to the in
>>progress connections" ?
>>
>>>>If a node supports PSC as well as TDM or LSC interfaces, it
>>>>might want to advertise different set of parameters in the
>>>>RESTART_CAP object for data LSPs as opposed to SONET/WDM
>>>>LSPs which form bearer channels in transport networks.
>>>>Currently this is not possible.
>>>>
>>>Can you give us explicit examples as to why and what do
>>>you gain by giving different values for PSC, TDM ?
>>>
>>In case of PSC devices, it may be OK to remove state that
>>is not resynchronized at the end of the recovery period,
>>and the recovery period advertised might reflect that.
>>But for LSPs in transport networks, one might want to
>>have a different recovery period to avoid any LSP from
>>going down because of recovery timer expiry.
>>
>
>There is no requirement for a node to advertise exactly the
>same Restart_Cap on all the interfaces. So, on PSC interfaces
>the node could advertise that it will remove the state that
>isn't syncronized at the end of the recovery period, while
>on the TDM interface precisely the same node could advertise
>that the LSPs would be kept even after the recovery time expires.
>

But to set up LSPs over TDM/LSC interfaces, the PSC interface
is going to be used for signaling - since control and data
planes are decoupled!  So, how will this help?


>
>
>>>>According to 9.5.1. (procedures for restarting LSR): "When
>>>>sending the corresponding outgoing Path message the node
>>>>SHOULD include a SUGGESTED_LABEL object with a label value
>>>>^^^^^^
>>>>matching the outgoing label from the now restored forwarding
>>>>entry."
>>>>
>>>>This has a conflict with 9.5.2.  Consider the case where
>>>>adjacent nodes B and C restart, and B has another adjacent
>>>>node A, and C has another adjacent node D.  B and C will get
>>>>resynced by A and D, and during this process, they will
>>>>resync. each other. While resyncing each other, they act as
>>>>neighbors of a restarting LSR, and hence according to 9.5.2,
>>>>MUST include the SUGGESTED_LABEL.
>>>>
>>>>Also according to 9.5.2: "During the recovery period, new
>>>>Path state being advertised to the restarted neighbor SHOULD
>>>>not include the SUGGESTED_LABEL object in the corresponding
>>>>outgoing Path message.  This will prevent the restarting
>>>>node
>>>> from erroneously reusing a saved forwarding entry."
>>>>
>>>>I guess this would mean that if suggested labels are used
>>>>during new LSP setup (as they are likely to be while
>>>>provisioning lightpaths - to reduce latency), then new LSP
>>>>setup will not be allowed during resyncing?
>>>>
>>>The use of RECOVERY_LABEL address all the above questions.
>>>
>>The first problems seems to be there still - consider two
>>adjacent nodes restarting.  They both act both as the restarting
>>node as well as the neighbor to the restarting node. So, once
>>they learn the state from the upstream neighbor, do they use
>>suggested label or the recovery label when they send the path
>>message to the just restarted downstream neighbor?
>>
>
>The recovery label.
>
>The following should be added to the existing text from the document:
>
>   In the special case where a restarting node also has a restating
>   downstream neighbor, a Recovery_Label object should be used instead
>   of a Suggested_Label object.
>
Since a restarting node may not be able to detect that a
downstream neighbor is restarting, I suggest it always use
a Recovery_Label object instead of a Suggested_Label object.


Gopal

>
>
>Yakov.
>
>