[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: draft-bonica-tunneltrace-02
Hi Ping, I've tried to be brief but sufficiently comprehensive. Hope this
helps you understand our perspective on this.
regards, Neil
> >
> > - a trail-tracing function is one diagnostic tool which
> is 'desirable'
> > to help operations people verify routing......it is not the
> only 'desirable'
> > diagnostic tool required. Moreover, it is not a defect
> detection/handling
> > tool, which is an 'essential' function IMO. I hardly think
> it appropriate
> > that in all the cases we want to use IP/MPLS technologies
> to expect the
> > customer to act as the defect detection mechanism. I see this as
> > particularly ironic and inconsistent given that in the
> GMPLS work it is
> > taken for granted that the defect detection/handling
> mechanisms relevant to
> > the traffic/data-plane of a L1 layer network (eg SDH) must indeed be
> > present....such as correct framing, BIP-X violations, trail-trace
> > violations, FDI to suppress higher client layer alarms, etc. ...
>
>
> Neil,
>
> Just curious, what's wrong with using IP "tools" to detect data-plane
> LSP problems? That's how we have been doing things in the
> Internet for
> years.
NH=> I assume you are meaning wrt MPLS rather than GMPLS?.....as latter (eg
SDH, OTN) do have the right things specified wrt data-plane....again because
its something operators have been doing for years in such bulk-transport
networks. The tools one invokes depends on the service application, however
the basic network principles are generic (I'll come back on that shortly).
Using MPLS for VPNs or supporting XoverMPLS can have quite different
customer/operational SLA expectancies than using MPLS for mass-market
BE/Internet service. I know the ping tools have been around for years and
that's why we'd like something better now....Peter Willis, who represented
BT at the last mtg/BoF if you recall, used to operate our IP networks has
vast experience of the limitations of such tools. I am sure Peter would be
very happy to share his views with you on the limitation of such tools if
you'd like some time.
If you get chance please have a read of Y.1710, the principles/requirements
therein offer a good guidance to what operators want to see addressed (or at
least operators like BT, NTT, AT&T, etc...I believe there was a list of
those operators supporting these requirements given at the last mtg/BoF).
But in summary:
The major drivers for operators are:
- reduce Opex costs by providing auto defect (i) detection (ii)
handling (ii) diagnostics;
- seek to improve the customer service/experience as/where
needed....and certainly don't expect to use customers as defect detectors;
- be able to offer robust/measureable availabilty and QoS SLAs to
customers as/where needed;
- provide a trigger mechanism for prot-sw as/where needed.
To address these drivers we need to (and there is an ordering requirement
here):
- 1st identify all types of defect;
- 2nd identify a mechanism(s) that will automatically detect the
defects, specify their entry/exit criteria, specify the appropriate
consequent actions (which may vary between different defects)....items for
consideration wrt consequent actions are:
* sending FDI to higher layers to suppress alarm storms (also works
other way round, ie L1 FDI suppresses MPLS alarms....this is how we require
that SDH AIS be used for example)
* squelch traffic if there is any chance that an important
customer's traffic being misdelivered
* send indication to head-end (for single-ended mon, prot-sw
reasons)
* raise appropriate alarms
- 3rd use persistency of defects to define unavailability entry/exit
criteria;
- 4th use above to determine the time over which QoS metrics are
valid, ie QoS aggregation needs suspending during unavailabilty as its makes
no sense to keep collecting it here against QoS SLAs.
Some simple but IMO very powerful solutions have been developed in Y.1711 to
satisfy these requirements...some people mistakenly think Y.1711 is complex,
but its just the opposite and it has been kept simple on purpose. The
requirements also ask for the following additional characteristics:
- both continuous and on-demand (this is the distinction between auto
defect detection/handling and ad hoc diagnostic tools I referred to
previously);
- a single defect should not give rise to multiple alarms or multiple
corrective actions;
- should cater for simple breaks (and be able to differentiate server
from within same layer breaks), swapped/misconfigurations, unintended
replication (with or without offending traffic being impacted) and
unintended self replication (eg loops, DOS attack);
- OAM functions should be backwards compatible and optional for
operators;
- one layer network's data-plane OAM should not rely on another layer
network's data-plane OAM (be that server or client relative to the layer
considered);
- data-plane defect handling should not be dependent on any
control-plane or management-plane protocols (inc the 'no control-plane'
case);
- defect detection/handling should not be dependent on customer
traffic activity levels;
- defect detection/handling should work relaible under degraded
conditions, ie error events.
So.....if we can satisfy this lot as simply as Y.1711 can then that's great.
>
> People may forget how "ping" and "traceroute" were used in
> the Internet
> years ago: they were used to solve the similar problems that we are
> facing today in (G)MPLS networks, i.e., to detect and locate trouble
> spots. In the NSFnet days, we had always used both commands (and some
> more) to detect microcode and HDLC bugs in the network....
>
> When (G)MPLS technology becomes more mature one day, there may not be
> much need to use any tool for detecting physical layer
> problem. But now,
> there is urgent need to have the tools. Instead of working out
> philosophical issues, why don't we just solve the problem?
NH=> Done it....its in Y.1711. Why not tell us what's wrong with this?
Note - Its not the complete answer, as it currently only addresses defect
detection/handling. The 'desirable' ad hoc diagnostic tools like
trail-trace and P-flows (ie to invoke more detailed QoS metrics) are the
next things to address.