RE: Comparison of restoration requirements between transport and packet networks

Hi Neil,

Please see comments inline

Thanks,

Richard.

-----Original Message-----
From: owner-ccamp@ops.ietf.org [mailto:owner-ccamp@ops.ietf.org] On Behalf Of neil.2.harrison@bt.com
Sent: Thursday, September 11, 2003 5:06 AM
To: rabbat@fla.fujitsu.com; ccamp@ops.ietf.org
Subject: RE: Comparison of restoration requirements between transport and packet networks

Richard,

In section 1 of your paper it says:

"This document presents a fault notification protocol that is both technology and topology agnostic, and applies to intra-domain protection. "

[Richard] What the draft meant is that we do not described the technology implementation of the flooding method described in it. Rather we keep the implementation separate. I’ll change to sentence to clear the misunderstanding

That being the case, wherever it is to be used one needs all the defects defining. Defects should be detected in the data-plane (and not by control-plane proxy) at the trail termination point using the OAM functions appropriate to the mode/technology, ie cnls is different to co-ps is different to co-cs. If one wants to go 'fast' (and I seriously question the sanity of those seeking to beat 50ms in SDH at higher layer networks) then only certain defects and technologies are relevant. Further, one should take care not to invoke protection/restoration for error events which self-clear.

[Richard] I wholeheartedly agree. 50 ms is most probably not doable in shared mesh networks.

NH=> That was not my point......50ms is *not* required by applications was my point. I see people doing irrational things that have little real prectical benefit, and this is not the worst offender.....see my brief remarks on 'complexity' in my prior mail, which I have snipped out below.

The general rule with protection/restoration is:

"As fast as *sensible* (ie don't trigger on error events) as close to the duct, and as slow as possible close to the application (and for sure don't trigger on error events here......noting that error events can get extended as they map upwards through layer networks)."

The other point I was alluding to is that one requires all the defects for the mode/technology in question to have been specified in terms of entry/exit criteria and consequent actions......and for co modes the FDI consequent action is essential for the reaosn I gave previously (snipped here). There are 3, and only 3, networking modes, viz cnls, co-ps and co-cs.....all technologies map to one of these. All modes are required, as all provide different behaviours. However, the functional components of each mode should migrate to best-of-breed......not crunched across all modes as that makes zero technical/commercial sense. OAM and fault detection/handling is one key functional component. It has a different specification in the cnls, co-ps and co-cs cases. It also assumes that the functional architecture of G.805 (co modes) and G.809 (cnls) is respected. In the co-ps/cs case the only valid topologies are p2p and p2mp.......break the rules here (can't in co-cs mode anyway, can in co-ps mode) and you have created a difficult (and quite unecessary) OAM/fault-management problem. That was was other point wrt to 'can you point to where the defects are specified is you want this proposal to be mode/technology agnostic?'

It may be doable in simpler configurations. The whole point of the draft is to *guarantee* a notification time. With all the layers doing some kind of protection and restoration at different time granularities, escalation some layers need to wait for lower layers to recover from a fault/defect before starting their own process. How does one define the time if there is no time guarantee?

NH=> A laudable aim....but it will never be possible to set hard bounds *unless* we fix the hierarchical client/server relationships for ever. Let me give you and example for you to answer wrt to the activities in the PWE3 group. What is the protection speed requirments for SDHoverMPLS in the client (SDH) and server (MPLS) case? And now extend this to some arbitrary nested client/server stack where one operator may not own the layers below a certain point (and which he has no visibility of).......so what is there to control 'which' and 'how many' client/server transitions exist below here?

{Aside - I have some interesting ideas how to architecturally model/control arbitrary (and silly) client/server relationships (eg like SDHoverIP) from a performance HRX viewpoint....but that is not for discussion here.}

Should we assign a random value and hope for the best or should there be a time after which one is assured that the other layer did not accomplish its task and engage its own recovery mechanism. Assign 1 second or 200 ms or any time as being a hard bound, but make it hard.

The defects that we thought about when we wrote this draft are:

- Fiber cut

- Transponder failure

- Node failure

I hope this clears up the misunderstanding.

NH=> See my remarks above. Wrt to the defects you considered, yes I realise you homed in on the OTN.