Hello
Everyone,
Following some discussions prior
to Vienna, and feedback and comments received during Vienna and thereafter,
we have realized that perhaps one aspect of
draft-rabbat-fault-notification-protocol-03.txt that we may not have
adequately highlighted is its focus on providing *time-bounded*
notification.
This is because draft-rabbat
focuses on recovery in optical transport networks, where recovery of failed
LSPs (fibers, lambdas, etc.) in a *bounded time* is critical for the
provider to be able to offer guarantees/SLAs to its transport customers, and
also to its L2 and L3 customers. The transport infrastructure often serves
as a foundation for the L2 and L3 networks built upon it, and so should be
able to provide recovery within some well-specified time, so that L2 and L3
recovery can be appropriately performed based on what L1
provides.
For
this reason, notification via signaling or OSPF-based flooding, which could
work well at the packet layer, may not be directly applicable at the
transport layer.
I agree with the notion of
the time-bounded recovery. However, here you started to make assumption
about possible solutions. The same confusion arrived at the last IETF
meeting when an LMP based solution was presented. All I am saying is that
IMO breaking the problem into two part, I.e., getting agreement on
the requirements and then following it up with the solution would be the
right approach.
I agree with the requirement
part of the problem statement.
In fact, since recovery at the packet
layer may not involve the stringent time constraints that are applicable at
the transport layer, directly comparing notification solutions at the packet
layer with those at the transport layer is probably not accurate.
What would be useful here is
to quantify the differences between the two
types of networks. Such quantification will be useful in catalyzing some
email discussions at the mailing list.
Rather,
we need to examine (as done in draft-rabbat) the applicability of signaling
and flooding to notification *at the
transport layer* under the constraint of achieving time-bounded
recovery.
Agreed!
So if the WG looks at
draft-rabbat with this backdrop, we believe some of the arguments made there
will be clearer. Of course, we welcome feedback from the
list.
Thanks,
-Richard and
Vishal
PS: The need for time-bounded
recovery is not new, and has been recognized in several recent IETF RFCs.
Notably,
RFC3272
http://www.ietf.org/rfc/rfc3272.txt
page 52:
--
- Failure
notification throughout the network should be timely and
reliable.
--
Note that there is a list of
requirements on this page, some of which are similar to those in
draft-rabbat-optical-recovery-reqs-00.txt.
RFC3386
http://www.ietf.org/rfc/rfc3386.txt,
which is also relevant to the whole discussion.
Page 15 discusses the
following:
--
Proposed timing
bounds for different survivability mechanisms are as follows (all bounds are
exclusive of signal propagation):
1:1 path protection
with pre-established capacity: 100-500 ms
1:1 path protection
with pre-planned capacity: 100-750
ms
Local
restoration:
50 ms
Path
restoration:
1-5 seconds
--
Note that RFC3386 discusses
horizontal hierarchy in data networks, and so the bounds above apply
primarily to the packet layer. Similar numbers for the transport layer will
likely be significantly stricter (Any operator inputs on
this?).