[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: draft-rabbat-fault-notification-protocol-04.txt



Hi George,

My explanations below and in a second email.  I will address your question
about path computation solutions in a separate email.

First, it's good that we've resolved questions 1 and 4. Progress is being
made. Let's discuss the remaining points.

> Hi Richard,
> 
> Thanks for your reply - a couple of points inline (with some 
> snipping...)
> 
> 
> Richard Rabbat wrote:
> 
> > Hi George,
> >
> > You've probably had time to review Vishal's explanations by now.
> Comments to
> > the items you raised inline.
> >
> >
> >>-----Original Message-----
> >>..........
> 
> >
> >>2) This draft seems to address the relatively simple problem of 
> >>setting up the restoration path. It seems to completely ignore the 
> >>much harder problem of allocating resources to the shared 
> >>restoration path, and of actually locating the fault in an optical 
> >>network to a single span in a time that is useful to restoration.


> > [Richard] If I understand the comment correctly, you are referring 
> > to the problem of path computation, which is a solved problem with 
> > many proposals in the literature. It is also orthogonal to the 
> > notification problem.
> >
> [George] Pre computation of shared restoration paths is a distributed 
> computaion problem. Its hard to see this being done without a new 
> protocol or significant modifications to existing protocols. I am not 
> aware of a plethora of solutions to this particular problem.

[Richard] This will be addressed in the 2nd email.

> > The fault localization problem is also different from the objective 
> > of
> this
> > draft. Localization of the fault has to occur and the fault 
> > information transmitted to a notification mechanism. The 
> > localization problem itself takes a certain amount of time as you 
> > mentioned.  Feedback from our
> hardware
> > experts says that it's doable in the range of a few milliseconds.

> [George] I think you are confusing detecting a fault with locating the 
> fault. The first is fast and easy, while the second is somewhat more 
> interesting. To be most efficient with restoration resources, you 
> really need to know where a fault is to a single link. Wether this can 
> be done quickly is very technology dependent.

[Richard] True. The applicability statement of FNP discusses the
technologies we are focusing on initially.  For this particular topic, our
interest lies in O-E-O switches where detection then localization of the
fault happen on a span basis. Section 4.2 of
draft-rabbat-fnp-applicability-00.txt is reproduced here for your reference:
--
   FNP is designed to work in networks with OEO nodes. Its applicability to

   networks with OOO nodes (that is, fully transparent all-optical networks)

   depends on the monitoring capabilities of the OOO systems deployed, and 
   is for further study.

   For a network with OEO nodes, the fault detection and correlation (which 
   happens before FNP is activated, and is outside the scope of this 
   document) occurs at the node closest to the fault. Once the detection 
   procedure has determined that a bonafide fault has occurred, it activates

   FNP for fault notification
--

> >>It makes no mention of the
> >>inaccuracies in network planning databases, which make one wonder 
> >>whether precomputation of restoration paths will actually lead to 
> >>faster restoration times.
> >
> >
> > [Richard] Restoration path computation relies on some amount of 
> > accuracy no
> > matter when it is done, whether before or after the fault. Since one 
> > is using the same database in both cases, precomputation will lead 
> > to
> faster
> > restoration time.
> >
> [George] Actually the database used for precomputation of restoration 
> is the planning database, which knows about diversity, and is 
> frequently incorrect (at least according to the operators I spoke with 
> quite a while ago while I was a strong proponent of pre computed 
> restoration). It is not the routing database, which has accurate 
> topology but no knowledge of diversity.

[Richard] Thanks for clarifying your comment. We understand that this is a
problem across any kind of restoration mechanism.  An elegant way of getting
the data into the routing database is through the use of SRLG.  The issue of
SRLG assignment is interesting but a separate discussion.

> >>Finally, it seems to presuppose that a network
> >>operator would make such a facilities database available to route 
> >>computation at all. The suggestion in sect 6.2 that the physical 
> >>length of the fibers be available for route computation is very 
> >>unlikely in any network I have ever worked on.

> > [Richard] In the past, with no need for such information it may have
> been
> > irrelevant to provide it. For time-bounded shared-mesh recovery, 
> > this information will be needed. It will afford the operator the
> sophistication
> > and bandwidth savings that shared-mesh provides.
> >
> [George] Restoration has been implemented in several technology 
> generations without the need for this degree of detail. Operators tend 
> to know how they run the network and are very reticent to make this 
> sort of change. I think you are being overly optimistic.
> 
[Richard] Shared mesh restoration has not been deployed in the past, agreed.
Some carriers though from our discussions have expressed renewed interest in
shared mesh. I'm not sure that would be considered overly optimistic, just
discussions with customers.

> >>3) .................
> >
> >
> >>An
> >>additional assumption seems to be that there is only one fault in 
> >>the network, and all bets are off if that is not true. There seem to 
> >>be problems with both these assumptions. It seems to me that there 
> >>are no mechanisms for truncating the PDU that is being sent, so 
> >>there is a finite chance that a significant extra delay is incurred. 
> >>Perhaps more serious is the assumption that all bets are off if 
> >>there are multiple faults in the network. In general, multiple 
> >>faults are those that lead to service outage. Two faults that do not 
> >>interact, in that they do not contend for the same network 
> >>resources, will be coupled by the flooding.
> >
> >
> > [Richard] Multiple faults that do not interact could be coupled if 
> > they occur in a time interval which is smaller than the delay of the 
> > flooding message across the network diameter. Even in a large 
> > network, this
> implies
> > faults must occur closer than a few 100 ms apart.
> >
> > In any case, please note that all bets are not off when it comes to 
> > FNP conducting the notification.  FNP will achieve the notification
> > irrespective of the number of faults.  In the case of multiple faults, 
> > the timing bound may not be guaranteed, if the common case one designs 
> > for by using FNP is for a single fault. There is no restriction in the 
> > protocol itself not to work with the assumption of multiple faults.  
> > Moreover, multiple faults may occur in less than 1% of the fault cases
> > according to a major US carrier we talked to.
> > SONET and other transport technologies only guarantee hard timing 
> > bounds in the case of single failures. Our approach affords us a better 
> > recovery procedure with proper planning.
> >
> >
> >>In addition, unsupressed restoration requests, which occur  when the 
> >>fault cannot be rapidly located to a single span, will also generate 
> >>restoration messages.
> >
> >
> > [Richard] Please refer to earlier answer about localization
> >
> [George] The all bets are off refered to the statement that 
> restoration times are no longer ensured when there are multiple 
> faults. This actually oocurs when a large fiber cable is cut and every 
> system on that cable issues alarms. As the cable usually supports in 
> dependent line systems, correlation across these systems is rarely, if 
> ever, done. The result is a storm of restoration requests. If your 
> reference operator suggested otherwise, I suspect that a different 
> question was being inadvertently answered.

[Richard] I took your wording "faults that do not interact" as independent
faults.  The carrier response referred to independent fiber cuts. 
But your example describes correlated faults. In the case of multiple faults
such as the one you describe, a correlated fault, we don't ensure the
restoration time unless we've computed restoration paths with those faults
in mind.

Compare this for a moment with a theoretical signaling-based solution for
time-bounded fault notification.  That solution will initiate recovery per
failed LSP.  In your example of a cable full of fibers, carrying multiple
wavelengths each, each wavelength in turn carrying a plethora of TDM
channels, one can quickly identify the scalability advantages of FNP over a
signaling-based approach. 
On another note, if you're interested in methods and techniques for merging
FNP notification messages, we could discuss that. 

> >>5) Until the effect of network database inaccuracies on the 
> >>effectiveness of  precomputed restoration is better understood, the 
> >>problem of allocating  resources in shared mesh networks is solved, 
> >>and it is certain that all faults will be located to the correct 
> >>span in a time useful to restoration, it seems to be premature to be 
> >>proposing a solution to the final piece of the problem.
> >>
> >
> > [Richard] I believe we've answered each individual point in that
> sentence in
> > the previous paragraphs. Given that, none are a stumbling block to 
> > the solution.
> >
> [George] I suppose it depends on what you consider a stumbling block!
> 
[Richard] The localization problem and the database inaccuracy problem are
common across all restoration methods and schemes and in my mind, that is
not reason enough to hold back work on these methods. The path computation
problem is addressed in the companion email. I hope this email clarifies the
remaining questions that you raised.

> Thanks for your thoughts - I also look forward to your reply to Deborah.
> 
[Richard] Deborah's email is a different thread that discusses other issues
and I will address those in a subsequent email once I get a bit of time.

> Regards
> 
> 	George
> 

Best,
Richard.