[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: draft-bonica-tunneltrace-02
Hi Ping....please see my response below.
regards, Neil
>
> Here is what I think on auto-detection and atuo-handling:
>
> 1. The operators should have the option to turn on some "ping-like"
> function on the suspicious LSPs at ingress LSRs. These LSPs will be
> queries periodically. It's important to make sure that this function
> won't introduce too much overhead.
NH=> Well I agree that is 1 option, and for some operators/applications this
may be good enough. However it has some limitations:
- because you don't know a priori where faults will occur you either
(i) have to turn it on for all important LSPs or (ii) revert back to using
the customer as the defect detection tool;
- it can't detect/diagnose defects other than simple breaks (these
should be easy to spot anyway IMO);
- it takes no consequent actions.....like suppressing alarms in client
layers, or protecting customer traffic (ie squelching traffic) if there is
traffic leakage observed, etc;
- there are no well-defined/agreed defects (and no agreed consequent
action as noted above), and as a consequence no well defined availability
metrics....and as a further consequence no possibility to relate any QoS
metrics gathered to available time. This has both interworking implications
and means operators/customers have no agreed base for availability/QoS SLAs;
- LSP-ping only works with RSVP-TE....we need a solution that is
control-plane agnostic (inc case of no control-plane) and client layer
agnostic (ie XoverMPLS).
Can I ask if you have ever read Y.1710/1711? All the above is catered for
by very simple means. A trivial keepalive flow (CV) that contains the LSP
source address....and an FDI, sent upwards of failure, to tell higher layers
'failure lower down so don't raise alarms'. There is also a BDI in case you
want to tell the upstream end of a downstream problem, eg unidirectional
monitoring of availability of a bi-directional LSP or for invoking 1/M:1/N
prot-sw. And that's about it......so Y.1711 is generic, extremely powerful
and comprehensive and yet very simple.
>
> 2. If a LSP is in trouble, it would be nice to have the "traceroute"
> function kick in automatically. Eventually, the trouble spot can be
> located and reported.
NH=> Well I think its more than this as noted above. Trace-Route is a 'nice
to have' ad hoc diagnostic tool. I looked at how to do it for MPLS sometime
ago (and I think I have a good solution) but I put it on a back-burner
because defect detection/handling and how to measure availability needed
sorting 1st....I can now return to the area of enhanced ad hoc tools. I
(and my Ops people) also want to define a P-flow so we can turn-on some more
detailed QoS metric collection, so I intend to look at that also.
<snipped NH>