Hello Everyone,
Following some discussions prior
to Vienna, and feedback and comments received during Vienna and thereafter, we
have realized that perhaps one aspect of
draft-rabbat-fault-notification-protocol-03.txt that we may not have
adequately highlighted is its focus on providing *time-bounded*
notification.
This is because draft-rabbat
focuses on recovery in optical transport networks, where recovery of failed
LSPs (fibers, lambdas, etc.) in a *bounded
time* is critical for the provider to be able to offer
guarantees/SLAs to its transport customers, and also to its L2 and L3
customers. The transport infrastructure often serves as a foundation for the
L2 and L3 networks built upon it, and so should be able to provide recovery
within some well-specified time, so that L2 and L3 recovery can be
appropriately performed based on what L1 provides.
For this
reason, notification via signaling or OSPF-based flooding, which could work
well at the packet layer, may not be directly applicable at the transport
layer.
I agree with the notion of the
time-bounded recovery. However, here you started to make assumption about
possible solutions. The same confusion arrived at the last IETF meeting when
an LMP based solution was presented. All I am saying is that
IMO breaking the problem into two part, I.e., getting agreement on the
requirements and then following it up with the solution would be the right
approach.
I agree with the requirement
part of the problem statement.
In fact, since recovery at the packet
layer may not involve the stringent time constraints that are applicable at
the transport layer, directly comparing notification solutions at the packet
layer with those at the transport layer is probably not accurate.
What would be useful here is to
quantify the differences between the two types of
networks. Such quantification will be useful in catalyzing some email
discussions at the mailing list.
Rather,
we need to examine (as done in draft-rabbat) the applicability of signaling
and flooding to notification *at the
transport layer* under the constraint of achieving time-bounded
recovery.
Agreed!
So if the WG looks at draft-rabbat
with this backdrop, we believe some of the arguments made there will be
clearer. Of course, we welcome feedback from the list.
Thanks,
-Richard and
Vishal
PS: The need for time-bounded
recovery is not new, and has been recognized in several recent IETF RFCs.
Notably,
RFC3272
http://www.ietf.org/rfc/rfc3272.txt
page 52:
--
- Failure
notification throughout the network should be timely and
reliable.
--
Note that there is a list of
requirements on this page, some of which are similar to those in
draft-rabbat-optical-recovery-reqs-00.txt.
RFC3386
http://www.ietf.org/rfc/rfc3386.txt,
which is also relevant to the whole discussion.
Page 15 discusses the
following:
--
Proposed timing
bounds for different survivability mechanisms are as follows (all bounds are
exclusive of signal propagation):
1:1 path protection
with pre-established capacity: 100-500 ms
1:1 path protection
with pre-planned capacity: 100-750
ms
Local
restoration:
50 ms
Path
restoration:
1-5 seconds
--
Note that RFC3386 discusses
horizontal hierarchy in data networks, and so the bounds above apply primarily
to the packet layer. Similar numbers for the transport layer will likely be
significantly stricter (Any operator inputs on this?).