[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: draft-rabbat-fault-notification-protocol-04.txt
Hi Deborah,
Thanks for your thoughts.
-- I have just sent an email to George and the list, providing the
complete set of documents that systematically discuss different
aspects of this work.
Since some of the latest versions of the drafts were in the list
of "missing IDs", I think you may not had time to see them yet.
So, please look over the note to the list, and read through the
latest versions of some of the documents, as they will clarify
several of the points.
(We have, for instance, discussed both misconnections and the issue
of network stability. The experimental draft referred to in my email
to the list will also be quite useful for future discussions.)
-- For example, it appears from your note below that there is a
considerable difference between your understanding of FNP and how
it actually works.
In reality, FNP employs a very carefully coordinated use of the
protection path. (It turns out that the coordination is largely
accomplished during path setup itself.)
This might also explain your suggestion that "FNP guarantees user
traffic will be misconnected," which quite obviously is not a
goal of FNP! :-)
Also, just to clarify again, this work is not proposing that
this be _the_ way to perform notification. Rather it is _a_ scheme to
do so, that is particularly applicable under restoration
time-constraints (because that is what it is designed for).
In this, it is v. complementary to the other approaches to the
notification and activation task, including the signaling approach
highlighted by the DT.
-- Obviously, we are well aware of the importance of misconnections,
and have considered it in several of the new documents (the
requirements drafts, and the revised expedited flooding draft).
We are also contemplating a squelching functionality, based on
suggestions received from other colleagues, that would enhance
the solutions/proposals we already have for this.
Again, if you will be at Seoul, we will be glad to sit down with
you and continue our discussions from Minneapolis. Hopefully, we
will be able to explain again the operation and purpose of this
scheme, and get other inputs.
-Vishal
> -----Original Message-----
> From: owner-ccamp@ops.ietf.org [mailto:owner-ccamp@ops.ietf.org]On
> Behalf Of Brungard, Deborah A, ALABS
> Sent: Thursday, February 26, 2004 1:32 PM
> To: George Newsome; ccamp@ops.ietf.org
> Subject: RE: draft-rabbat-fault-notification-protocol-04.txt
>
>
> George,
>
> Good to see your attention is activated;-)
>
> I've been discussing privately with the authors my concerns over
> the last several meetings.
>
> The fatal negative with the FNP approach is that the use of the
> protection path is not coordinated - no handshake between the two
> ends (and intermediate nodes) for use of the protection path.
> "All nodes notified of the failure will activate the recovery
> path by performing the required hardware reconfiguration". And
> the ingress node starts sending user traffic after an elapsed
> time window. This uncoordinated use of the protection path
> guarantees user traffic will be misconnected - unacceptable for
> an operator.
>
> The key requirement in the P&R DT work was that misconnections
> are not allowed, and is why the DT's approach uses coordinated
> signaling to notify all nodes along the path. The DT's approach
> is referred in this draft as incurring "lengthy delay" vs. FNP.
>
> Another draft for your attention is
> draft-rabbat-optical-recovery-reqs. Requirement 8 states "A
> recovery scheme SHOULD make sure that recovery actions correctly
> move traffic from failed paths to their respective recovery
> paths, such that the recovery actions do not result in long-term
> misconnections". This requirement needs to be reworded to "SHALL"
> and "long-term misconnections" to "any misconnections".
>
> Deborah
>
> -----Original Message-----
> From: owner-ccamp@ops.ietf.org [mailto:owner-ccamp@ops.ietf.org]On
> Behalf Of George Newsome
> Sent: Tuesday, February 24, 2004 8:41 PM
> To: ccamp@ops.ietf.org
> Subject: Re: draft-rabbat-fault-notification-protocol-04.txt
>
>
> All,
>
> My attention was drawn to
> draft-rabbat-fault-notification-protocol-04.txt, which provokes the
> following comments.
>
> 1) There seems to be some notion that the time taken to restore is a
> crucial element of high availability, yet overall availability is
> controlled by unprotected elements failure rate and by mean time to
> repair, rather than by switching time. (A 1 second switch is less
> 1/10000 of the generally accepted MTTR of 4 hrs)
>
> 2) This draft seems to address the relatively simple problem of setting
> up the restoration path. It seems to completely ignore the much harder
> problem of allocating resources to the shared restoration path, and of
> actually locating the fault in an optical network to a single span in a
> time that is useful to restoration. It makes no mention of the
> inaccuracies in network planning databases, which make one wonder
> whether precomputation of restoration paths will actually lead to faster
> restoration times. Finally, it seems to presuppose that a network
> operator would make such a facilities database available to route
> computation at all. The suggestion in sect 6.2 that the physical length
> of the fibers be available for route computation is very unlikely in any
> network I have ever worked on.
>
> 3) One must wonder whether a flooding approach is actually best anyway.
> The assumption seems to be that a flooding protocol PDU can be forced
> onto the front of the send queue, thereby incurring minimum delay. An
> additional assumption seems to be that there is only one fault in the
> network, and all bets are off if that is not true. There seem to be
> problems with both these assumptions. It seems to me that there are no
> mechanisms for truncating the PDU that is being sent, so there is a
> finite chance that a significant extra delay is incurred. Perhaps more
> serious is the assumption that all bets are off if there are multiple
> faults in the network. In general, multiple faults are those that lead
> to service outage. Two faults that do not interact, in that they do not
> contend for the same network resources, will be coupled by the flooding.
> In addition, unsupressed restoration requests, which occur when the
> fault cannot be rapidly located to a single span, will also generate
> restoration messages. It also seems to me that routing changes may well
> start to be flooded at the same time scale as restoration activity is
> taking place. There is no mention of possible interactions with this.
>
> 4) Assuming that this problem is worth solving, and that a flooding
> protocol is the best solution, is it a good idea to generate yet
> another protocol that floods, and is LMP the vehicle of choice to embed
> yet another protocol? It seems to me that restoration has a strong
> interaction with routing change announcment, so it seems to me to make
> more sense to use those mechanisms rather than invent new ones.
>
> 5) Until the effect of network database inaccuracies on the
> effectiveness of precomputed restoration is better understood, the
> problem of allocating resources in shared mesh networks is solved, and
> it is certain that all faults will be located to the correct span in a
> time useful to restoration, it seems to be premature to be proposing a
> solution to the final piece of the problem.
>
>
> Regards
>
> George
>
>
>