[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RSVP Restart (was Re: update GMPLS signaling documents)




Hi Yakov,

Yakov Rekhter wrote:

>>>
>>>>>>I have some questions/concerns on the restart mechanism as
>>>>>>specified in draft-ietf-mpls-generalized-rsvp-te-04.txt.  I
>>>>>>have listed them below.
>>>>>>
>>>>>>If a node that is terminating hierarchical LSPs restarts,
>>>>>>there is an ordering issue during resynchronization, since
>>>>>>the LSPs would depend on other FA-LSPs the interface IDs for
>>>>>>which would have been generated dynamically. So the
>>>>>>mechanism as described will not work, unless this
>>>>>>information is preserved across restarts - in which case we
>>>>>>might as well preserve other information as well, and avoid
>>>>>>resynchronization.
>>>>>>
>>>>>Since transit node of the FA-LSP does not see the RSVP
>>>>>messages that use the FA, we only need to consider
>>>>>the ingress and egress node. If the egress node restarts,
>>>>>the worst case is that its upstream node sends an ERO
>>>>>consisting a unrecognized interface address. Since the
>>>>>Path message will carry RECOVERY_LABEL (replacing
>>>>>SUGGESTED_LABEL), the egress node knows that it might
>>>>>not have complete information yet, and can hold on to
>>>>>the Path message until the FA-LSP is established.
>>>>>All the information is there, so implementation
>>>>>can be completely within the egress node. It does work,
>>>>>just not specified enough.
>>>>>
>>>>>Ingress node will have to know the dependency, so not
>>>>>a problem there.
>>>>>
>>>>Are you suggesting that in case of an ERO with an
>>>>unrecognized interface id, if the Path message also
>>>>carries a RECOVERY LABEL, the egress node can hold on
>>>>to it until later?  Note that there may not be a
>>>>RECOVERY LABEL - since this Path message is for a
>>>>hierarchical LSP, and the previous node may not be
>>>>directly connected in the control plane.  Here is an
>>>>example - There is an LSP L1 from C to F, and on top of
>>>>it there is an LSP L2 from A to G.
>>>>
>>>>L1:              C --- D --- E --- F
>>>>L2:  A --- B --- C --------------- F --- G
>>>>
>>>>When F restarts, C may not even know, unless HELLO
>>>>protocol is run between C and F.  Note that the
>>>>HELLO mechanism is intended for immediate neighbors.
>>>>
>>>>And even if the HELLO mechanism is extended, there
>>>>is the issue of interface id assigned by F.  F needs
>>>>to assign the same interface id upon restart so as
>>>>not to bring down the hierarchical LSPs set up on top
>>>>of L1 in 
>>>>the reverse direction (say an LSP from G to
>>>>B set up on top of L1.)  Where would F get this
>>>>information ?
>>>>
>>>Couple of points:
>>>
>>>(1) First of all, you do run RSVP Hello between C and F.
>>>
>>O.K.
>>
>>>(2) Since L1 is advertised as an FA into OSPF/ISIS, F should
>>>be able to recover the Interface ID it assigns to L1 from
>>>a combination of (a) the OSPF/ISIS link state database that
>>>F would recover, and (b) the Forward Interface ID (the one
>>>assigned by C).
>>>
>>True. One could either use IGP restart mechanism to relearn
>>this (I have other concerns on RSVP restart being dependent on
>>IGP restart, but that for later) or preserve this across restarts.
>>In either case,  you agree that RSVP restart depends on some
>>mechanism outside RSVP to help it along?  So RSVP needs a new
>>interface to get this mapping upon restarts, right? This is not
>>clear in the draft.
>>
>
>Couple of points:
>
>1. As Dimitri Papadimitriou mentioned in his other e-mail to this
>list, an Internet Draft "is not a textbook on how to use GMPLS-SIG".
>
The intent is not to make this a textbook at all.

I refer you to documents such as
draft-ietf-ospf-hitless-restart-01.txt which state clearely what
help you need from outside OSPF for restart to work, and as far
as interfaces, RFC 2205 sec 3.11.

In addition, the first answer I got on this list, (I believe,
from one of the authors) stated that

" All the information is there, so implementation
  can be completely within the egress node. It does work,
  just not specified enough."

whereas, in your answer, you have acknowledged that RSVP does
need help in learning the interface ids. that the restarting
node had assigned.  So, is the other answer from an "uninformed
reader" ?

I am not trying to put down anybody - just trying to point out
some things are not clear. I have also heard from implementors
who did not realize that RSVP needs to learn the interface ids
that were assigned before the restart. They could claim that 
their implementation is in full compliance with the draft, even
though hierarchical LSPs will be dropped upon restart.

>
>
>2. What I mentioned in the above should be abundantly obvious to the
>informed reader of RSVP and LSP Hierarchy specs.
>
>
>
>>>>>>Also, let us consider a node that can preserve state across
>>>>>>restarts, and hence does not need its state to be synced by
>>>>>>its peer.  How will it advertise this in the RESTART_CAP
>>>>>>object?
>>>>>>
>>>>>We still need to resynchronize the state since the other
>>>>>side might have connections that are in progress. The
>>>>>
>>>>Let us say the node can resynchronize its neighbor if
>>>>the neighbor restarts and requests state recovery. But
>>>>the issue is how a node can advertise that it does not
>>>>need recovery since all its state was preserved?
>>>>
>>>By treating is the same way as the way the spec handles
>>>control channel fault.
>>>
>>True, but this does not help.  
>>
>
>Help with what ?
>
I meant this does not work - for reasons cited below.

>
>
>>See below.
>>
* deleted *

>
>>>>
>>>>If a node supports PSC as well as TDM or LSC interfaces, it
>>>>might want to advertise different set of parameters in the
>>>>RESTART_CAP object for data LSPs as opposed to SONET/WDM
>>>>LSPs which form bearer channels in transport networks.
>>>>Currently this is not possible.
>>>>
>>>>>Can you give us explicit examples as to why and what do
>>>>>you gain by giving different values for PSC, TDM ?
>>>>>
>>>>In case of PSC devices, it may be OK to remove state that
>>>>is not resynchronized at the end of the recovery period,
>>>>and the recovery period advertised might reflect that.
>>>>
>>>>
>>>>But for LSPs in transport networks, one might want to
>>>>have a different recovery period to avoid any LSP from
>>>>going down because of recovery timer expiry.
>>>>
>>>There is no requirement for a node to advertise exactly the
>>>same Restart_Cap on all the interfaces. So, on PSC interfaces
>>>the node could advertise that it will remove the state that
>>>isn't syncronized at the end of the recovery period, while
>>>on the TDM interface precisely the same node could advertise
>>>that the LSPs would be kept even after the recovery time expires.
>>>
>>But to set up LSPs over TDM/LSC interfaces, the PSC interface
>>is going to be used for signaling - since control and data
>>planes are decoupled!  So, how will this help?
>>
>
>Help with what ? You asserted that it "is not possible" for
>"a node that supports PSC as well as TDM and LSC interfaces..
>to advertise different set of parameters in the RESTART_CAP
>object for data LSPs as opposed to SONET/WDM LSPs."
>
>I pointed out to you that your assertion is incorrect, as
>there is no requirement for a node to advertise exactly the
>same Restart_Cap on all of its interfaces.
>
I believe your assertion is misleading - If two nodes A and
B have multiple TDM/LSC links (with out of band signaling)
between them as well as a PSC link, (they are all unnumbered
interfaces), there will be only one Hello adjacency between
A and B, resulting in just one Restart_cap object advertised
in each direction.

In that case, how can one advertise different values in the
Restart_cap object for TDM/LSC LSPs as opposed to data LSPs?

Since there is only one adjacency, the suggestion above to
treat the node failure as control channel failure does not
work either, as the restarting node might want to get
resynchronized for data LSPs.

I believe your assertion is possible only if all interfaces
are numbered.  If so, this is a huge constraint - please
refer draft-ietf-ipo-carrier-requirements-00, sec 7.4.

Cheers,

Gopal


>
>
>>>>>>According to 9.5.1. (procedures for restarting LSR): "When
>>>>>>sending the corresponding outgoing Path message the node
>>>>>>SHOULD include a SUGGESTED_LABEL object with a label value
>>>>>>^^^^^^
>>>>>>matching the outgoing label from the now restored forwarding
>>>>>>entry."
>>>>>>
>>>>>>This has a conflict with 9.5.2.  Consider the case where
>>>>>>adjacent nodes B and C restart, and B has another adjacent
>>>>>>node A, and C has another adjacent node D.  B and C will get
>>>>>>resynced by A and D, and during this process, they will
>>>>>>resync. each other. While resyncing each other, they act as
>>>>>>neighbors of a restarting LSR, and hence according to 9.5.2,
>>>>>>MUST include the SUGGESTED_LABEL.
>>>>>>
>>>>>>Also according to 9.5.2: "During the recovery period, new
>>>>>>Path state being advertised to the restarted neighbor SHOULD
>>>>>>not include the SUGGESTED_LABEL object in the corresponding
>>>>>>outgoing Path message.  This will prevent the restarting
>>>>>>node
>>>>>>from erroneously reusing a saved forwarding entry."
>>>>>>
>>>>>>I guess this would mean that if suggested labels are used
>>>>>>during new LSP setup (as they are likely to be while
>>>>>>provisioning lightpaths - to reduce latency), then new LSP
>>>>>>setup will not be allowed during resyncing?
>>>>>>
>>>>>The use of RECOVERY_LABEL address all the above questions.
>>>>>
>>>>The first problems seems to be there still - consider two
>>>>adjacent nodes restarting.  They both act both as the restarting
>>>>node as well as the neighbor to the restarting node. So, once
>>>>they learn the state from the upstream neighbor, do they use
>>>>suggested label or the recovery label when they send the path
>>>>message to the just restarted downstream neighbor?
>>>>
>>>The recovery label.
>>>
>>>The following should be added to the existing text from the document:
>>>
>>>  In the special case where a restarting node also has a restating
>>>  downstream neighbor, a Recovery_Label object should be used instead
>>>  of a Suggested_Label object.
>>>
>>Since a restarting node may not be able to detect that a
>>downstream neighbor is restarting, I suggest it always use
>>a Recovery_Label object instead of a Suggested_Label object.
>>
>
>Please see my reply to Yangguang on this topic.
>
>
>
>Yakov.
>
>