[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: draft-rabbat-fault-notification-protocol-04.txt
Hi George,
Thanks for your interest in this work.
However, before we can have a meaningful discussion on the subject,
I think it is important that you read through the complete set
of drafts that are related to notification.
You will find that a majority of your questions are already
considered in these documents.
(Those that remain are not specific to this approach, and will
come up in any restoration scheme, so they will have to be addressed
in a general way regardless.)
I would highly recommend that you also look at the proceedings from
the Yokohama, San Fransisco, Vienna, and Minneapolis IETF's, as some of the
presentations/discussions at these CCAMP meetings are v. useful to
more fully understand this work.
Once you've done that, we will be ready to discuss further details.
(If you're going to be at Seoul, we can always clarify things
in person, it's a bit easier with a pen and paper handy.)
BTW, these documents are actually the result of numerous fruitful
discussions at several IETF meetings and on-line (cf. mailing list
archives from 2003) and form a logical sequence that sets the right
context for this work and takes one systematically through it.
Just to help you (and others who are interested in this work),
here is the recommended sequence.
i) There is a basic requirements draft that highlights requirements for
control-plane based recovery of data-plane failures in optical transport
networks.
(This was the result of a request from the CCAMP Chairs, way back
in Yokohama 2002, to have a document that collates requirements. It has
actually been refined several times, and for the last two times was
republished with a different name, thus an 01 version.)
Optical Transport Network Failure Recovery Requirements
http://www.ietf.org/internet-drafts/draft-rabbat-optical-recovery-reqs-01.tx
t
ii) This is followed by a more in-depth document, that is implementation
agnostic, and that discusses very meticulously both signaling and
flooding solutions. It then details requirements for time-bounded
notification in optical transport networks, and the need for an expedited
flooding mechanism to do so.
Expedited Flooding for Restoration in Shared-Mesh Transport Networks
http://www.ietf.org/internet-drafts/draft-rabbat-expedited-flooding-01.txt
It also talks of multiple failures and ways to handle them.
(I think it is important to mention here that the proposal makes clear
that both flooding and signaling have a role, and does not exclude either
as a way of doing notification and recovery path activation.)
iii) There is then the _protocol-agnostic_ specification of a flooding-based
solution to the time-bounded notification problem
Fault Notification Protocol for GMPLS-Based Recovery
http://www.ietf.org/internet-drafts/draft-rabbat-fault-notification-protocol
-04.txt
Just to re-iterate, this draft only lays out protocol operation and required
messaging and formats, and does not mandate (by design) any specific
implementation of FNP.
Several protocols can be used for this, and this is reflected in the
experimental draft next, which highlights two ways of implementing flooding,
at the two network layers of interest.
iv) There is then an experimental track document, which I believe is
extremely useful, because it presents real results and experiences from
two independent testbed implementations of flooding for fault notification.
It also provides the protocol enhancements that the implementors made
to LMP and OSPF-TE to realize the flooding function, and has an excellent
discussion of many issues.
(It was the direct product of many discussions at Minneapolis, where it
was suggested that these experiences and results be shared with the IETF
community under an experimental track doc.)
Implementation and Performance of Flooding-based Fault Notification
http://www.ietf.org/internet-drafts/draft-rabbat-ccamp-perf-flooding-notific
ation-exp-00.txt
v) Finally, there is a draft that discusses the applicability of
FNP in the context of optical transport networks. This was done to
address, in one place, questions about the network, fault, etc. model
to which FNP applies, which we addressed in many ML discussions last
year.
Observations on the Applicability of the Fault Notification Protocol
http://www.ietf.org/internet-drafts/draft-rabbat-fnp-applicability-00.txt
Thanks,
-Vishal
> -----Original Message-----
> From: owner-ccamp@ops.ietf.org [mailto:owner-ccamp@ops.ietf.org]On
> Behalf Of George Newsome
> Sent: Tuesday, February 24, 2004 5:41 PM
> To: ccamp@ops.ietf.org
> Subject: Re: draft-rabbat-fault-notification-protocol-04.txt
>
>
> All,
>
> My attention was drawn to
> draft-rabbat-fault-notification-protocol-04.txt, which provokes the
> following comments.
>
> 1) There seems to be some notion that the time taken to restore is a
> crucial element of high availability, yet overall availability is
> controlled by unprotected elements failure rate and by mean time to
> repair, rather than by switching time. (A 1 second switch is less
> 1/10000 of the generally accepted MTTR of 4 hrs)
>
> 2) This draft seems to address the relatively simple problem of setting
> up the restoration path. It seems to completely ignore the much harder
> problem of allocating resources to the shared restoration path, and of
> actually locating the fault in an optical network to a single span in a
> time that is useful to restoration. It makes no mention of the
> inaccuracies in network planning databases, which make one wonder
> whether precomputation of restoration paths will actually lead to faster
> restoration times. Finally, it seems to presuppose that a network
> operator would make such a facilities database available to route
> computation at all. The suggestion in sect 6.2 that the physical length
> of the fibers be available for route computation is very unlikely in any
> network I have ever worked on.
>
> 3) One must wonder whether a flooding approach is actually best anyway.
> The assumption seems to be that a flooding protocol PDU can be forced
> onto the front of the send queue, thereby incurring minimum delay. An
> additional assumption seems to be that there is only one fault in the
> network, and all bets are off if that is not true. There seem to be
> problems with both these assumptions. It seems to me that there are no
> mechanisms for truncating the PDU that is being sent, so there is a
> finite chance that a significant extra delay is incurred. Perhaps more
> serious is the assumption that all bets are off if there are multiple
> faults in the network. In general, multiple faults are those that lead
> to service outage. Two faults that do not interact, in that they do not
> contend for the same network resources, will be coupled by the flooding.
> In addition, unsupressed restoration requests, which occur when the
> fault cannot be rapidly located to a single span, will also generate
> restoration messages. It also seems to me that routing changes may well
> start to be flooded at the same time scale as restoration activity is
> taking place. There is no mention of possible interactions with this.
>
> 4) Assuming that this problem is worth solving, and that a flooding
> protocol is the best solution, is it a good idea to generate yet
> another protocol that floods, and is LMP the vehicle of choice to embed
> yet another protocol? It seems to me that restoration has a strong
> interaction with routing change announcment, so it seems to me to make
> more sense to use those mechanisms rather than invent new ones.
>
> 5) Until the effect of network database inaccuracies on the
> effectiveness of precomputed restoration is better understood, the
> problem of allocating resources in shared mesh networks is solved, and
> it is certain that all faults will be located to the correct span in a
> time useful to restoration, it seems to be premature to be proposing a
> solution to the final piece of the problem.
>
>
> Regards
>
> George
>
>