[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: I-D ACTION:draft-soumiya-lmp-fault-notification-ext-01.txt



Hi Adrian,

Thanks for reading the draft. Please see comments inline.

> -----Original Message-----
> From: Adrian Farrel [mailto:adrian@olddog.co.uk]
> Sent: Wednesday, July 02, 2003 7:57 AM
> To: soumiya.toshio@jp.fujitsu.com; rabbat@alum.mit.edu
> Cc: ccamp@ops.ietf.org; thamada@fla.fujitsu.com; kanoh@jp.fujitsu.com;
> Vishal Sharma (E-mail 2)
> Subject: Re: I-D ACTION:draft-soumiya-lmp-fault-notification-ext-01.txt
> 
> Hi,
> 
> Two brief questions about this draft...
> 
> 1. In section 3 you say
> 
>    [Optional]: If the receiving node has activated one or more recovery
>    paths, it sends a RecoveryCompleteNotify message to either the egress
>    nodes of the recovery LSPs or to the NMS.  It continues sending
>    RecoveryCompleteNotify messages periodically until it receives a
>    RecoveryCompleteNotifyAck message or a timer to retry sending
>    expires.
> 
>    ...Isn't this a change to the way LMP operates? That is, before this
>    message, LMP is a neighbor-to-neighbor protocol.


That is true. We did use LMP in our prototype testbed for these optional
messages to make the NMS aware that the recovery is complete, but that was
just because our NMS talked LMP.  As our draft discusses, this message is
specified as [optional], but we thought it would be good to have some kind
of communication to tell the NMS that the recovery was completed, since it
is aware of the fault at some point.
We do not advocate the use of LMP for this message and did not specify its
format in the draft for that reason.  We'll clarify that is in not an LMP
message in our next draft submission. If people think this message is not
needed, we could drop it altogether.


> 2. Fault repair
> 
>   I don't see anything in your draft that discusses fault repair. How
>   does the reporting node revoke its fault report?
> 
>   This would appear to be a requirement otherwise repaired
>   resources will never be made available again. I think that when
>   you add this function and add the necessary controls to ensure
>   that each node has the right state (fault or no fault) you will have
>   invented a link state protocol.
> 

If I understand your point correctly, you are concerned about sending the
state -let's call it: "Fault Repaired"- as part of our protocol. We do not
think the "Fault repaired" is a time-critical message and it could be
propagated through the regular channels.  Ultimately, a policy issue could
decide what to do about a "Fault Repaired" message. This could involve
reactivating previously working paths, or keeping traffic on the activated
recovery paths.  Therefore, we use LMP for Fault Notification *only*.

>   At this point I don't see what LMP gives you that an existing
>   link-state protocol doesn't already deliver. Certainly the
>   speed of reporting of faults to every node in the network
>   will be lost once you have to prevent "thrash" of fault and
>   fault clear notifications.

I think this problem exists in all situations, whether you use signaling or
flooding. In the case of "fault clear" notifications using signaling, you
could send the message to the egress that has already started the activation
of the recovery path. Then the egress has to do something about it. I think
the best way to deal with it is to complete the activation process and look
at policy to decide what to do about a "Fault repaired" message.  

> In any case, it is not clear to me that
>   every node in the network needs rapid notification of faults -
>   only those nodes that constitute repair points for the LSPs
>   that used the failed resource need to know quickly and they
>   hear about the problem through a directed Notify message
>   that must propagate faster than any hop-by-hop protocol
>   can.

I'm not sure about that. Your directed Notify message has to initiate
signaling on the recovery path, which would have to go on the path and back,
whereas a probably shorter route will exist between the detecting node and
the nodes on the recovery paths, leading to faster recovery.
In addition, assuming that you will have one or few Notify messages is not
going to work in a large mesh network:
1. recovery paths have to be calculated using whatever
protection/restoration algorithm which tries to increase network efficiency
2. they likely will have different egress points since they will be
protecting LSPs with differing ingress(es)/egress(es).

The assumption about lots of recovery paths being the same for the same
fiber fault is restrictive.

Please let me know if I understood you correctly.

> Adrian

Thanks,
Richard.