re: Comparison of restoration requirements between transport and packet networks

Hi Neil,

Thanks for the email. I’d like to ask you to expand a bit on your idea on the mailing list. I’m afraid I’m a bit confused about the question and need more information.

Do you mean by your question:

- How do we figure out what the defect is and where it happened? Is it a fiber cut? Is it a misconfiguration? How do you do fault localization

Or:

- What set of defects does your solution address?

Or maybe something else.

Thanks,

Richard.

-----Original Message-----
From: neil.2.harrison@bt.com [mailto:neil.2.harrison@bt.com]
Sent: Wednesday, September 10, 2003 1:13 PM
To: rabbat@fla.fujitsu.com; ccamp@ops.ietf.org; zali@cisco.com
Cc: v.sharma@ieee.org
Subject: RE: Comparison of restoration requirements between transport and packet networks

Where are the defects specified (take any networking mode/layer you like) wrt entry/exit criteria and consequent actions?

regards, Neil

-----Original Message-----
From: Richard Rabbat [mailto:rabbat@fla.fujitsu.com]
Sent: 10 September 2003 20:11
To: ccamp@ops.ietf.org; Zafar Ali
Cc: Vishal Sharma; 'Richard Rabbat'
Subject: Comparison of restoration requirements between transport and packet networks

Hi Zafar, CCAMP,

During discussions with several colleagues within the CCAMP WG, it has become clear that it would be useful to clarify some of the fundamental differences between restoration in packet networks and that in transport networks.

This is because this difference, together with the time criticality of restoration at the transport layer, requires the development of techniques for time-bounded notification. It would then be useful to discuss the solutions proposed in draft-rabbat-fault-notification-protocol-03 for such notification.

We are in the process of preparing a contribution on this subject, but thought it would be useful to highlight a few key points on the mailing list, so that we can elicit feedback and comments from the WG.

In normal packet networks (MPLS networks) one can pre-signal *and* pre-configure a backup LSP for a working LSP. This is because selecting a label at a node for a backup LSP is sufficient to be able to switch traffic for that LSP when that traffic arrives. If resources are required for the backup LSP (buffers and bandwidth), they too can be reserved in advance (during the LSP signaling phase), but can still be used by low-priority or extra-traffic LSPs as long as there is no failure on the working LSP.

This is true even for shared mesh restoration in MPLS networks. In that case, multiple labels would be assigned, one for each of the backup LSPs (corresponding to link and/or node disjoint working LSPs) transiting a node on the shared backup path, but only one set of resources (buffers, bandwidth) would be reserved (if such resource reservation was needed).

In transport networks, however, one can pre-signal but not pre-configure a backup LSP (unless one was doing just 1+1 protection). This is because, in transport networks, if an LSP is established (that is, it is cross-connected) then the full bandwidth of the LSP is automatically *consumed*, irrespective of whether traffic actually flows on this LSP.

For this reason, to implement shared restoration schemes in transport networks (and allow extra-traffic) a backup LSP cannot be cross-connected until *after* the specific failure for which this backup LSP was pre-signaled has occured.

Now, if signaling-based notification is used in transport networks, an *additional phase of signaling* is required along the backup path to enable nodes along that path to reconfigure themselves (this is well-described in the functional specification document

of the P&R Design Team). This lengthens the time to recover from the failure. Depending on the layer at which recovery is being performed this may or may not be acceptable.

In the specific case of transport networks, restoration is typically a time-critical activity, so this round-trip signaling delay could be unacceptable when time-bounded notification and recovery is desired.

In addition, signaling individual LSPs or individual LSP bundles may create buffering problems that makes signaling time unbounded.

If instead, the information about a failure is flooded to all the network nodes, and the backup paths are selected intelligently (as described in draft-rabbat-fault-notification-protocol-03.txt), this additional signaling hand-shake delay can be eliminated. This is because by flooding the information about a fault on a working LSP, one can inform, in parallel, all the nodes lying along the path of the backup LSP. Thus, the repair point(s) upon learning of the fault holds off activating the backup LSP(s) for an appropriate time in which all nodes along the corresponding backup path(s) will have reconfigured themselves.

We would also like to get feedback on a suitable protocol that could implement time-critical flooding notification.

Comments, thoughts and questions are welcome!

--

Richard Rabbat, Ph.D.

Member of Research Staff, Fujitsu Labs of America

1240 E Arques Ave, MS 345, Sunnyvale, CA 94085

Phone: 408-530-4537. Fax: 408-530-4581. Cell: 650-714-7618