RE: Flooding using LMP extensions

Hi Roberto and Nicola,

Since the Fault Notification Protocol (draft-rabbat-fault-notification-protocol) is implementation-agnostic, do you agree with it or have any comments that we can add to it? Thanks in advance.

w/r to the comments and questions that you sent, please see in-lined comments below.

Thanks.

> Hi all,

> we are working to a MPLS testbed supporting end-to-end

> Protection-Restoration mechanisms and we faced the problem of

> link-failures notification.

> We share the scalability concerns of RSVP-like

> solutions reported in the Rabbat's mail.

Thanks for the support. We do believe that flooding is a more scalable

approach and is a way (if not the way) to go.

> We are in favour of OSPF flooding-based mechanisms for link-failures

> using Opaque because:

> - applicable to both MPLS-TE - GMPLS.

While it is true that OSPF may be applicable to both MPLS

and GMPLS, I'm not sure that the MPLS community would want to replace or

include another option besides Fast Reroute.

> - Proved OSPF protocol stability and robustness with respect to the LMP

> solution.

> OSPF flooding in general is mature, instead LMP has to be extended

> and it has to support some OSPF capabilities we already have.

I agree that OSPF flooding is more mature; the changes that you are

proposing to OSPF are substantial, so, they may reduce stability.

> - Reduction of routing failure probability respect to the use of RSVP

> (see below).

We agree with this point. Since it applies to any flooding-based

solution, it does apply to our LMP solution as well. Good point.

> In draft-katz-yeung-ospf-traffic-09 it is written that in a TE

> scenario

> we can have a module in the edge nodes that searches constrained

> routes based on Opaque TE info.

> We call a "routing failure" the computation by edge node X of a path

> which includes a failed link F.

> Clearly, routing failures are a consequence of lack of notification,

> to X, of the failure of F.

> With RSVP failure notification, this can occur:

> - in case of single fault, when there are no LSPs originated from X

> crossing F;

> - in case of dual faults (i.e., two links L1 and L2 fails almost

> simultaneously),

> if from X there is a LSP crossing both L1 and L2:

> X---link-----node------link(L1)------node-------link(L2)-----egress_node

> the failure of L1 can hide the notification of L2 failures.

> It can be seen that with OSPF flooding, there is virtually no

> potential

> routing failure at all, as ALL the edge nodes are notified any

> failure.

Again agreed that this shows the advantage of flooding vs. signaling.

Thanks for the example.

> So if we have a flooding-based notification, all the edge nodes in a

> network

> will be aware about the failure. Instead, with RSVP, we'll have

> only some nodes aware of the failure! So we have an increasing of

> routing failure probability after a link failure.

> We agree that a major drawback of the OSPF-flooding solution is the need

> of revisiting the timers, as pointed out in the Rabbat's mail:"Flooding

> using LMP extensions".

> In fact we can't wait for a max period of MinLSInterval seconds to notify

> a failure...

> So we have to modify something.

> Among the possible solutions:

> 1) Introduction of a new timer in OSPF for a new sub-TLV

> used to carry the info of broken link.

> The current timer of MinLSInterval should not consider

> this new field.

> 2) Force the flooding when a link failure signal arrives

> and reset the timer.

> We think that the 2) solution has more advantages.

> The current behavior of the OSPF protocol is (considering Opaque

> extensions):

> <----------------MinLSInterval------------->

> B1 | B2 | B3 | FAIL. |

> | | | |

> ---+----------------+------+----------+-------+------------> time

> | |

> FL1 FL2

> WHERE:

> - FL1 is a flooding of B1 information.

> - FL2 is a flooding of B3 + FAIL. information.

> - FAIL. could be a signal coming from a failure detection mechanism

> (i.e. from lower layer).

> - B1, B2, B3, FAIL. are external OSPF Opaque inputs.

> Note that B1, B2, B3 could be Bandwidth updates (link TLVs updates).

> We thought to solve in this way:

> <-------------MinLSInterval--------->

> <-------------MinLSInterval------>

> B1 | B2 | B3 | FAIL.|

> | | | |

---+-----------+------+--------+------------------------------------------>

> | | time

> | |

> FL1 FL2

> In this case we force the flooding (FL2) when arrives a signal of

> link-failure (FAIL.) and we reset the MinLSInterval timer so that

> it restarts from the failure event.

I think this makes sense when using OSPF Opaque LSA's. One issue is the need

to make sure that when you get all these flooding messages, you process the

failure-related messages first, then go on to looking at B1, B2, B3.

This makes it quite cumbersome to create the Opaque LSA and

requires that one enforce that the node process FAIL before the Bi's.

This isn't particularly easy to ensure, especially when using

different platforms.

> To enforce the robustness of this solution, and to avoid continous

> flooding of failure notifications in case of interface flapping, we have

> to consider atimer in the module (external to OSPF) that detects the

> link-failures and triggers the flooding of FL2 Opaque LSA.

> Consider that some external module is needed to trigger the OSPF flooding

> of failure notification, as we can not rely on the HELLO process for its

> long detection delay.

> A possibility is to let LMP to detect the failure and trigger the OSPF

> flooding.

> In this approach, the timer to avoid interface-flapping should be included

> in the LMP trigger.

> This solution is conservative in that it only requires LMP extensions

> (timer in the trigger) and just a minor modification to the OSPF process

> (i.e., accept a force-to-send and MinLSInterval-reset trigger).

There remains the concern of finding a similar solution for IS/IS, thus

duplicating the work in order to allow its use.

In addition, you could be in a situation where a routing protocol

is not run at all, and routes are all predetermined, so one would

have to run OSPF just to achieve this functionality?

Note, however, that in a GMPLS-based transport network, it is expected

that LMP would be running anyway (for the several other functions it

performs), so that was our first choice.

Furthermore, even after modifications, we believe the LMP-based solution

is lightweight compared to burdening OSPF with this functionality.

But to go back to the main point, it sounds like the Opaque LSA that you are

proposing is in the end very similar to our LMP approach.

What is the feeling of the rest of the CCAMP community members at large?

Best,

Richard

> Thanks in advance for your kind observations.

> Regards

> Roberto Albanese, Nicola Caione

> University of Rome - La Sapienza, Italy