[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: I-D ACTION:draft-soumiya-lmp-fault-notification-ext-01.txt
Hi Richard,
> > 1. In section 3 you say
> >
> > [Optional]: If the receiving node has activated one or more recovery
> > paths, it sends a RecoveryCompleteNotify message to either the egress
> > nodes of the recovery LSPs or to the NMS. It continues sending
> > RecoveryCompleteNotify messages periodically until it receives a
> > RecoveryCompleteNotifyAck message or a timer to retry sending
> > expires.
> >
> > ...Isn't this a change to the way LMP operates? That is, before this
> > message, LMP is a neighbor-to-neighbor protocol.
>
> That is true. We did use LMP in our prototype testbed for these optional
> messages to make the NMS aware that the recovery is complete, but that was
> just because our NMS talked LMP. As our draft discusses, this message is
> specified as [optional], but we thought it would be good to have some kind
> of communication to tell the NMS that the recovery was completed, since it
> is aware of the fault at some point.
> We do not advocate the use of LMP for this message and did not specify its
> format in the draft for that reason. We'll clarify that is in not an LMP
> message in our next draft submission. If people think this message is not
> needed, we could drop it altogether.
Sure.
If we determine the message *is* needed, we have a problem.
Time for another protocol?
> > 2. Fault repair
> >
> > I don't see anything in your draft that discusses fault repair. How
> > does the reporting node revoke its fault report?
> >
> > This would appear to be a requirement otherwise repaired
> > resources will never be made available again. I think that when
> > you add this function and add the necessary controls to ensure
> > that each node has the right state (fault or no fault) you will have
> > invented a link state protocol.
>
> If I understand your point correctly, you are concerned about sending the
> state -let's call it: "Fault Repaired"- as part of our protocol. We do not
> think the "Fault repaired" is a time-critical message and it could be
> propagated through the regular channels. Ultimately, a policy issue could
> decide what to do about a "Fault Repaired" message. This could involve
> reactivating previously working paths, or keeping traffic on the activated
> recovery paths. Therefore, we use LMP for Fault Notification *only*.
You are correct that "Fault repaired" is not as time critical.
When you talk about "regular channels" I'm concerned you are setting
yourself up for problems. Suppose OSPF-TE is your regular channel. When a
"Fault" and "Fault repaired" are received in close succession (in either
order) you will need to determine which really applies. By using different
protocols you make this more difficult than by using a single protocol.
> > At this point I don't see what LMP gives you that an existing
> > link-state protocol doesn't already deliver. Certainly the
> > speed of reporting of faults to every node in the network
> > will be lost once you have to prevent "thrash" of fault and
> > fault clear notifications.
>
> I think this problem exists in all situations, whether you use signaling
or
> flooding. In the case of "fault clear" notifications using signaling, you
> could send the message to the egress that has already started the
activation
> of the recovery path. Then the egress has to do something about it. I
think
> the best way to deal with it is to complete the activation process and
look
> at policy to decide what to do about a "Fault repaired" message.
As above, the issue is not what to do when receiving a "Fault repaired",
but how to achieve temporal reconcilliation between "Fault" and "Fault
repaired" in an unreliable network.
> > In any case, it is not clear to me that
> > every node in the network needs rapid notification of faults -
> > only those nodes that constitute repair points for the LSPs
> > that used the failed resource need to know quickly and they
> > hear about the problem through a directed Notify message
> > that must propagate faster than any hop-by-hop protocol
> > can.
>
> I'm not sure about that. Your directed Notify message has to initiate
> signaling on the recovery path, which would have to go on the path and
back,
> whereas a probably shorter route will exist between the detecting node and
> the nodes on the recovery paths, leading to faster recovery.
There are two separate issues here.
- The Notify *will* go by the shortest route. It does not follow the
path of the LSP, but is routed by the SPF. It is not processed
by each router that sees it (indeed they need not be MPLS capable).
Thus, the Notify achieves faster fault notification (to designated LSRs)
than hop-by-hop LMP flooding can.
- Signaling for transition to recovery paths is needed in all cases
regardless of the fault notification method unless the path is
presignaled
in which case no signaling is required.
> In addition, assuming that you will have one or few Notify messages is not
> going to work in a large mesh network:
> 1. recovery paths have to be calculated using whatever
> protection/restoration algorithm which tries to increase network
efficiency
I don't believe the protection/recovery algorithm changes the way that
Notifications are propagated.
> 2. they likely will have different egress points since they will be
> protecting LSPs with differing ingress(es)/egress(es).
This is true. However, you must contrast the worst case (one Notify in each
direction for each impacted LSP) with the other worst case (flooding of LMP
fault notification to every node in the network)
> The assumption about lots of recovery paths being the same for the same
> fiber fault is restrictive.
Agreed.
> Please let me know if I understood you correctly.
Progress is being made!
Adrian