[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Standards for IP stats collection? (corrected)



Neil -- your comments on inter-relation between failure modes and QOS are very
on target.
You are right that the issues are a pandora's box, but I don't find the
availability analysis
of the IP layer quite as hopeless. I have a few comments below.
I would appreciate anyone's responses as to the "hopelessness index" of this
problem.

neil.2.harrison@bt.com wrote:

> Bert....I can understand your earlier comment (wrt wanting limiting WG
> coverage), but I can probably understand why Vishal wrote to several lists:
> -       IP need its own perf management regime
> -       MPLS ditto
> -       GMPLS ditto....and there can be several layer networks involved
> here, eg SDH (which has several distinct internal layer networks) and OTN.
>
> This is unavoidable as each of the above create layer networks that must be
> managed in their own right.  And to do a proper job its not simply QoS Perf
> stats that are at issue for operators, its more like this:
>
> 1       user-plane and control-plane need their own OAM once dealing with CO
> paths, eg GMPLS/MPLS-ER-LSPs, since one cannot assume user/control-plane
> congruence of function or routing....so they are independent.  In the
> user-plane one must have an initial clear view of defects (entry/exit
> criteria) and their handling (ie correct consequent actions).  For an
> operator this is an operational must-have.
>
> 2       After above, the next issue is availability.  This has to be defined
> before QoS metrics/objectives can be considered.  Why?  QoS metrics are only
> valid when a path is in the up-state, otherwise the QoS metrics have no
> viable time-base and/or can get (statistically) distorted by events which
> should be regarded as down-time ....so up/down-state transitions need to be
> clearly identified next.  Note - this issue is not clear-cut for the IP
> layer when dealing with destination-based hop-hop routing.  Indeed, there
> are very difficult issues with failures mutating into QoS hits which I have
> raised on the MPLS list before which make availability analysis of the IP
> layer impossible (IMO) for the *network* (ie as subnetwork partitioned
> allocations of a global reference connection) except on a true end-end
> basis.

I think (but am not quite sure from your wording) that you find the goal of
separating
failure analysis at the physical/link layers from availabiltiy & QoS at the IP
layer unattainable.
My view is that these functions of network state are inter-related and to seek
to make them
independent is quixotic.
Here is a view:
The various layers below IP will have outage phenomena. These are measurable,
albeit imperfectly.
Both control-plane and data-plane will have outage phenomena. These are
measurable, I think.
("Outage" includes route stability loss during convergence, for conventional IP
routing. Ex.: Vern Paxsons 1996 paper on internet routing behavior.)
MPLS will have separate, but commensurate outage behavior (despite fast
reroute).
All of this feeds into the analysis of QoS at the IP layerand above.
QoS is both an IP connection availability (reachability) issue, as well as a
packet loss, delay etc. issue.
Up/down state transitions can generally cause both (a) reachability loss, and
(b) data loss.
So, *given adequate measurement and characterization* of the phenomena,
I don't see it as impossible. If this is naive, why?


>
>
> 3       Now we can consider QoS.....but be careful.  It costs money to
> measure/collect/process (in OSS) these....every technology I have been
> involved with has started with a large metric wish-list that gets whittled
> down to something more pragmatic later.  My advice is that these should be
> of 2 types of QoS metric collection: (i) ad hoc 'sw-on' function for
> trouble-shooting as needed by operational people, or the continuous
> measurement of 'important' paths, and (ii) general network population
> sampling (to get overall network trends and spot latent anomolous
> behaviour).

Yes. The problem as I see it here is that many of the measurement stds impose
invasive problems on the network. RMON and general polling via SNMP are
examples.


>
>
> 4       policing/re-classification actions need monitoring.
>
> 5       nodal/link utilisations need monitoring and fed to a TE function and
> the design/forecasting cycle.
>
> So one needs a framework, where these pieces relate but where one piece does
> not attempt to the job of all the others.
>
> BTW - We (ie me/Shahram Davari/Ben Mack-Crane/Peter Willis) have just posted
> an ID for MPLS user-plane which deals with 1 and 2 above.

I would be interested in a link to your draft.

>
>
> neil
>
> > -----Original Message-----
> > From: Sambasiva R. Mantha [mailto:sambu.mantha@usa.alcatel.com]
> > Sent: 02 March 2001 20:50
> > To: terry martin
> > Cc: ellanti@home.com; Bora Akyol; Vishal Sharma; 'te-wg@uu.net';
> > 'mpls@uu.net'; ccamp@ops.ietf.org
> > Subject: Re: Standards for IP stats collection? (corrected)
> >
> >
> > I would like to correct a little bit regarding GR-253. The
> > GR-253 requires 8
> > hours of 15-minute registers and 7 one-day registers and not
> > 3-5 days. This
> > means that a SONET must have 33 15-minute registers (32
> > registers for previous 8
> > hours and 1 for current 15 minutes) and 7 1-day (previous
> > day) registers.
> > I would agree with Manohar that these PM registers are an
> > over-kill for LSPs as
> > they really donot contribute anything to the traffic flowing
> > through the NE.
> >
> > Sambu
> >
> >
> > terry martin wrote:
> >
> > > RMON is the only protocol standard that dictates
> > information collection
> > > requirements.  That is the only IP service that is
> > structured to collect
> > > trend stats.  I think it is also 15 minute intervals for
> > 3-5 days depending
> > > on how much memory you get in the unit.
> > >
> > > There are structure requirements for collection, type of
> > traffic collected
> > > and how it is presented.
> > >
> > > Need sources- let me know
> > >
> > > Terry Martin MS Telecommunication Engineering
> > > Senior Consultant
> > > tmartin@gvnw.com
> > > 503-612-4422
> > >
> > > ----- Original Message -----
> > > From: "Manohar Naidu Ellanti" <ellanti@home.com>
> > > To: "Bora Akyol" <akyol@pluris.com>; "Vishal Sharma"
> > > <vishal@JasmineNetworks.com>
> > > Cc: "'te-wg@uu.net'" <te-wg@UU.NET>; "'mpls@uu.net'" <mpls@UU.NET>;
> > > <ccamp@ops.ietf.org>
> > > Sent: Thursday, March 01, 2001 9:14 PM
> > > Subject: RE: Standards for IP stats collection? (corrected)
> > >
> > > > It will be interesting to see if there is a need for TDM
> > world operational
> > > > features to be carried into MPLS world.
> > > >
> > > > I think the reason for 15 minutes etc PM counters for
> > things like Severely
> > > > Errored Seconds etc was to deduce the quality of
> > transmission line and use
> > > > this to feed into link cost. For instance realiability
> > metric could be
> > > based
> > > > on such information for the link.
> > > >
> > > > For LSPs does it really make sense ? to have TDM world operational
> > > features.
> > > > There is not even CRC or any header information to
> > determine if MPLS
> > > packet
> > > > was received correctly. It is more at lower layers. It
> > would be nice to
> > > see
> > > > some useful features carried forward and avoid
> > unnecessary requirements
> > > from
> > > > GR-XXX.
> > > >
> > > > -Manohar
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: owner-mpls@UU.NET [mailto:owner-mpls@UU.NET]On
> > Behalf Of Bora
> > > > Akyol
> > > > Sent: Thursday, March 01, 2001 8:57 PM
> > > > To: Vishal Sharma
> > > > Cc: 'te-wg@uu.net'; 'mpls@uu.net'; 'ccamp@ops.ietf.org'
> > > > Subject: Re: Standards for IP stats collection? (corrected)
> > > >
> > > >
> > > > Vishal
> > > >
> > > > I don't think that there are such standards for routers.
> > I know that some
> > > > routers store such data on flash cards for later
> > retrieval and some on
> > > > hard disks.
> > > >
> > > > I would be curious to see how people are storing this
> > data and for how
> > > > long?
> > > >
> > > > Bora
> > > >
> > > >
> > > > On Thu, 1 Mar 2001, Vishal Sharma wrote:
> > > >
> > > > > Hello All,
> > > > >
> > > > > For the TDM world, GR-253 lays out strict standards for
> > > > > the length of time that a carrier-class box should collect
> > > > > and store statistics on-board, for retrieval later. The
> > > > > number is something like 15-min intervals for 3 days.
> > > > > The purpose supposedly is that if the connection to the EMS
> > > > > dies, the box at least should allow the provider to recover
> > > > > statistics data from it.
> > > > >
> > > > > My question is: what are similar standards (or existing
> > > > > best practices) in the IP carrier community today? How much
> > > > > statistics-related information do carriers like to have from
> > > > > IP boxes?
> > > > > What would carriers like to have?
> > > > >
> > > > > (The only reference I could find on this was Blain Christian's
> > > > > draft
> > > > >
> > > >
> > >
> > http://search.ietf.org/internet-drafts/draft-christian-tewg-me
> asurement-00.t
> > > > xt)
> > > >
> > > > Are there others?
> > > > Do people (read carriers) have any thoughts or suggestions or
> > > > pointers?
> > > >
> > > > Thanks,
> > > >
> > > > -Vishal
> > > >
> > >
> > >

--
http://www.makesystems.com

begin:vcard 
n:Tibbs;Richard
tel;cell:919.412.4129
tel;home:919.787.4387
tel;work:919.412.4129
x-mozilla-html:FALSE
version:2.1
email;internet:rtibbs@ieee.org
adr;quoted-printable:;;3408 Cherry Lane=0D=0A;Raleigh;NC;27607;
fn:Dr. Richard Tibbs
end:vcard