Re: Flooding using LMP extensions

Hi all,
we are working to a MPLS testbed supporting end-to-end
Protection-Restoration mechanisms and we faced the problem of
link-failures notification.
We share the scalability concerns of RSVP-like
solutions reported in the Rabbat's mail.
We are in favour of OSPF flooding-based mechanisms for link-failures
using Opaque because:

- applicable to both MPLS-TE - GMPLS.

- Proved OSPF protocol stability and robustness with respect to the LMP solution.
OSPF flooding in general is mature, instead LMP has to be extended
and it has to support some OSPF capabilities we already have.

- Reduction of routing failure probability respect to the use of RSVP (see below).
      In draft-katz-yeung-ospf-traffic-09 it is written that in a TE scenario
      we can have a module in the edge nodes that searches constrained routes
      based on Opaque TE info.
      We call a "routing failure" the computation by edge node X of a path
      which includes a failed link F.
      Clearly, routing failures are a consequence of lack of notification,
      to X, of the failure of F.
      With RSVP failure notification, this can occur:
      - in case of single fault, when there are no LSPs originated from X crossing F;
      - in case of dual faults (i.e., two links L1 and L2 fails almost simultaneously),
         if from X there is a LSP crossing both L1 and L2:

X---link-----node------link(L1)------node-------link(L2)-----egress_node

the failure of L1 can hide the notification of L2 failures.

It can be seen that with OSPF flooding, there is virtually no potential
routing failure at all, as ALL the edge nodes are notified any failure.

      So if we have a flooding-based notification, all the edge nodes in a network
      will be aware about the failure. Instead, with RSVP, we'll have
      only some nodes aware of the failure! So we have an increasing of
      routing failure probability after a link failure.

We agree that a major drawback of the OSPF-flooding solution is the need of revisiting
the timers, as pointed out in the Rabbat's mail:"Flooding using LMP extensions".
In fact we can't wait for a max period of MinLSInterval seconds to notify a failure...
So we have to modify something.

Among the possible solutions:

1) Introduction of a new timer in OSPF for a new sub-TLV
    used to carry the info of broken link.
    The current timer of MinLSInterval should not consider
    this new field.

2) Force the flooding when a link failure signal arrives
and reset the timer.

We think that the 2) solution has more advantages.
The current behavior of the OSPF protocol is (considering Opaque extensions):

<----------------MinLSInterval------------->

    B1 |             B2 |   B3 |    FAIL. |
       |                |      |          |
    ---+----------------+------+----------+-------+------------> time
       |                                          |
       |                                          |
     FL1                                        FL2

WHERE:

- FL1 is a flooding of B1 information.
- FL2 is a flooding of B3 + FAIL. information.
- FAIL. could be a signal coming from a failure detection mechanism
(i.e. from lower layer).
- B1, B2, B3, FAIL. are external OSPF Opaque inputs.
Note that B1, B2, B3 could be Bandwidth updates (link TLVs updates).

We thought to solve in this way:

<-------------MinLSInterval--------->
<-------------MinLSInterval------>

   B1 |        B2 |   B3 |   FAIL.|
      |           |      |        |
   ---+-----------+------+--------+------------------------------------------>
      |                           |                                   time
      |                           |
     FL1                        FL2

In this case we force the flooding (FL2) when arrives a signal of
link-failure (FAIL.) and we reset the MinLSInterval timer so that
it restarts from the failure event.

To enforce the robustness of this solution, and to avoid continous flooding of
failure notifications in case of interface flapping, we have to consider a
timer in the module (external to OSPF) that detects the link-failures and
triggers the flooding of FL2 Opaque LSA.

Consider that some external module is needed to trigger the OSPF flooding
of failure notification, as we can not rely on the HELLO process for its long detection delay.
A possibility is to let LMP to detect the failure and trigger the OSPF flooding.

In this approach, the timer to avoid interface-flapping should be included in the LMP trigger.
This solution is conservative in that it only requires LMP extensions
(timer in the trigger) and just a minor modification to the OSPF process
(i.e., accept a force-to-send and MinLSInterval-reset trigger).

Thanks in advance for your kind observations.

Regards

Roberto Albanese, Nicola Caione
University of Rome - La Sapienza, Italy