[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fault Notification Protocol & Recovery Requirements



Hi all,

At the SF IETF, we had discussed updates to a set of drafts for P&R that we
believe are appropriate for advancement to CCAMP WG drafts.
 
At the conclusion of the discussion at SF, Kireeti had encouraged us to move
this process forward on the mailing list, and help structure the
discussions, so we're keen to do so, and any feedback on the discussion
below is welcome.

The feedback at the SF meeting was very interesting, and several issues were
raised (cf. CCAMP meeting minutes 
http://ops.ietf.org/lists/ccamp/ccamp.2003/msg00361.html).

i) Feedback from Alex was that OSPF flooding was a process that led to many
net meltdowns before people got it right. So we need to look carefully at
using flooding for fault notification.

ii) George had expressed the view that since LMP was considered a pt-to-pt
protocol, using it for fault notification via flooding would require changes
to implementation models. 
This assumes, we think, that LMP may be implemented only at the line cards,
and its use in flooding would require communication, via the control plane,
between multiple LMP state machines

iii) Kireeti had stated that we should provide a better insight into the
requirements and the problem. The main discussion point was whether we
should use LMP-WDM for flooding fault notification messages.  

We will address parts of (ii) and (iii) in a follow-up email.

I'd like to address and discuss the above in the context of the following
drafts:

1. draft-czezowski-optical-recovery-reqs-01.txt.
This draft presents the requirements for optical restoration, and was
produced upon a request (and rightly so) by Kireeti in the Yokohama meeting
(cf. IETF 53 Meeting minutes), asking us to provide the requirements that
motivated our fault notification work.

We believe this complements well the nice work done by the P&R Design team.
The team has provided terminology, functional specification, and analysis of
recovery schemes, while our draft attempts to make precise the requirements
(which haven't so far been gathered in one place).

We would like to underscore the importance of well-articulated requirements,
and are happy to work with others in the WG to do this. One option would be
for the WG to adopt this draft as a WG document, so that interested members
of the WG can collaborate to complete this work.

Ron, Kireeti, would it be possible to take a vote on this?

2. draft-rabbat-fault-notification-protocol-02.txt:  
The draft addresses a problem highlighted in the P&R Design team's
terminology draft. Namely, that P&R under tight time constraints is a very
challenging problem. 
Therefore, we present an implementation-independent protocol for fault
notification through flooding, which focuses on keeping to a time constraint
during the recovery.

To answer Alex's question, Alex, we have shown in an accompanying white
paper:
http://perth.mit.edu/~richard/wp-ietf-fault-notification.pdf) 
that flooding has advantage of dealing with per-link (thus with multiple
simultaneous failures) as opposed to per-LSP (or per group of LSPs)
failures, and the advantage of reliability. Due to the former, the overhead
due to flooding can, in fact, be less than in point-to-point failure
signaling. 

We would like this draft to be adopted as a WG doc. by the CCAMP WG to
further advance it. 

Ron, Kireeti, is that a good course of action?

Richard.