[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: draft-ietf-tewg-measure-05.txt comments



	A few comments inline.


> -----Original Message-----
> From: owner-te-wg@ops.ietf.org 
> [mailto:owner-te-wg@ops.ietf.org] On Behalf Of Jim Boyle
> Sent: Monday, May 05, 2003 11:27 PM
> To: te-wg@ops.ietf.org
> Subject: draft-ietf-tewg-measure-05.txt comments
> 
> 
> 
> At a high level the document is broken up into the following outline
> 
> 1-3 Introduction
> 4   Definitions
> 5   Rational / Uses
> 6   Time scales
> 7   Readout / sampling / summarization
> 8   Bases  (e.g. node, link, path, node-pair)
> 9   Entities (e.g. traffic volume, delay, ...)
> 10  Types (Permute Bases x Entities - define what's valid) 
> 11-14 Some blah blah (see specific comments too :) 15  
> Recommendations and Conclusion
> 
> At a high level I think we can ditch section 4 and move these 
> into section 
> 8 and 9 as appropriate.  Sections 10-14 are interesting, but 
> I don't think 
> they are directly necessary in specifying recommendations on 
> *measurement* 
> (maybe what to do with the measurements, but no bearing on 
> the measurement 
> itself).  Section 10 or 15 would be good places for the CRISP 
> recommendations.
> 
> Here are general comments:
> 
> - there is no discussion of the form of a measurement, e.g. 
> should traffic 
> volume be accumulated or a rate, should time (E.g. delay) be 
> ms, ns, or 
> seconds.  These twists could lead to inconsistency - where is 
> that to be 
> defined?
> 
> - "hold-time" of an LSP (aka uptime).  What about an RSVP session with
>   signalled make-before-break?  If it were initially signalled at t=0,
>   and then at t=5 it changed it's path, and let's say at t=10 it
>   increased it's bandwidth, if at t=15 we ask how long it's been up
>   what is the answer? 
>   - also t=15, is that minutes - or hours? or do we not care as long
>     as it is specified :)
> 
> - Statistical measures such as "variance" and other second order
>   measurements.  I don't think this is a raw measurement as much as a
>   calculated value.  Should nodes calculate this?  Over how many
>   samples?  Why not just calculate offline based on the raw
>   measurement?
> 
> - maybe expand on what you mean by per service class
> 
> - Flow measurement is a deep-hole, we should either defer to other
>   standardization efforts on this, or we need to be a lot more
>   concrete on what we need to know here.	

	Defer to the IPFX WG that is tackling this problem
right now.


> 
> - Node Pair -v- Path.  The Node-pair could replicate a lot of
>   information that is available on a path basis.  Do we want that?
>   The advantage might be some persistence in the measurement.  I think
>   every router should have a counter for bytes switched to bgp
>   nexthops, maybe something similar to a Node's known MPLS
>   destinations (e.g. egresses for *mine* LSPs)
> 
> Here are specific comments:
> 
> Section 3, first sentence - the goal is not to have a 
> "framework", the goal is to foster consistent measurements 
> across implementations for traffic engineering purposes.
> 
> "To achieve multi-vendor interoperability..."  Not sure how 
> measurements on different systems can not be interoperable, 
> maybe multi-vendor consistency.

	I think he might have been referring to the management
interfaces (i.e.: SNMP, CLI, XML, etc...), but you are
right that consistent is important as well (or at least 
well-understood).  For example, it is useful to know the
sampling rates/update times for statistics updates from
line cards, as they may be cached for some seconds.
If you are collecting stats from two different boxes with
two different intervals and your collection period is
small enough, this might make a difference.

> "Other principles such as concise reprensentation" - we 
> should be focussing on more than principles.  Maybe we should 
> identify guidelines for measurements.  E.g. not that we 
> should have "accurate" measurements, or even "traffic volume" 
> -as much as that volume should be represented in accumulated bytes.
> 
> "average hold-time" - you can only measure hold-time (more 
> precisely up-time of an LSP or Path).  You can calculate 
> average hold-time offline.  Are you suggesting that it is 
> important that the node also calculate this value?
> 
> Througput in section 4.3 suggest a possible sustainable rate, 
> e.g. the throughput of an OC192 is 10 Gbs (or something close 
> to that), in section 9.1 it looks like it is the amount of 
> traffic which "passes". I think 4.3 is more correct, is 
> througput actually needed in 9.1?
> 
> Section 5.1 second paragraph (the one that is not 
> bulletized).  I see no value in this and suggest striking it.
> 
> Section 5.3.... hmmm...  Where is this applied?
> 
> Section 7.1 (data reduction), are we saying we want the node 
> to store data for periodic retrieval (e.g record retrieval) - 
> or do we prefer near real time polling techniques like SNMP 
> (or both?).  

	I think that both types are needed depending on the
type of TE stats being gathered and what they are 
going to be used for later.

> Unless we are saying exactly where we want 
> record retrieval capability, I suggest removing section 7.1
> Similar argument on section 7.3 (summarization) - unless we 
> say what we want summarized on the node, and how, I suggest 
> removing section 7.3
> 
> Ditto on section 7.4 (sampling).  Unless we say what should 
> be sampled how, why discuss?
> 
> Section 8.2 (interface / link base), on bundled links.  
> Bundled links are links, so why call them out as special 
> unless we have some concrete way to handle them.  For 
> instance do we want to somehow stipulate that there should be 
> measurement visibility on the bundle and the component links, 
> that they should somehow be tied together? if so, what is the 
> recommendation?
> 
> Section 8.4 (path-based base).  In first sentence, is it the 
> "route-pinning" that gives MPLS the means to develop 
> path-based measurement.  It think it's more the ability to 
> tie an edge-to-edge FEC into a specific LSP and then to have 
> that FEC not visible (or
> tunnelled) through transit nodes.  Thus ingress, transit and 
> egress nodes have the ability to distinquish and count on a 
> per macro-flow basis, and they all know what role they are 
> relative to a particular LSP.
> 
> Section 9.1 (entities).  It may be best to move definities 
> into here. Currently you have "entitied, measurment unit 
> class" and a bunch of notes.  Some concise definitions might 
> firm these up.
> 
> Delay - what if a node can not truly measure delay?  Should 
> we say there needs to be a way to state this?  Do we 
> recommend active measurement devices for this?
> 
> Packet Loss - It is said that it should be monitored, but no 
> where have we stated that we want to monitor the offered 
> load, the accepted load and the delivered load.  Or where we 
> measure this (e.g if we measure delivered load at the 
> head-end, that implies some way to propogate the 
> information).  If we have accepted load at head-end and 
> delivered load at tail-end we can infer the packet loss - is 
> this the approach?  Are we missing a policed load on an LSP 
> at the head end (does anyone care?)
> 
> Section 9.2 - "To characterize paths ... the following 
> entities may possibly be dfined" (either say they need 
> definition, or don't mention them.
> 
>   - path setup / release delay - an interesting measurement, currently
>     not readilly available.
> 
>   - path setup denial / error / etc probability - an offline 
> calculation, so
>     nix?
> 
>   - path restoration time - what about FRR?  would this be measured at
>     transit node? communicated to head-end?
> 
>   At a node base, you may want to track setup attempts, failures,
>   preempted sessions, optimization checks, maybe even average LSP
>   uptime...
> 
> Section 10.1 Types.
> 
> No "X" on Interface / Throughput, should there be?
> 
> For Delay of a node-pair, what if you have multiple paths 
> with different delays.  Maybe "X" on node-pair delay is not 
> good, as it can be readily calculated from path delays for 
> that node-pair.
> 
> Section 10.2 - does anyone really care how much control 
> traffic is consuming on their network.  For BGP traffic, was 
> the intent the sourced or anything that transits the 
> node/link (and distinquish in-my-net -v- across-my-net?) I 
> suggest removing section 10.2 unless folks care and can show 
> where it is applied in a recommendation (e.g. a type).
> 
> Section 10.2 through 14 was interesting to read, but did not 
> provide any direct bearing on what measurements are needed.  
> I suggest that they be removed.
> 
> Section 15 -
> 
> "a standardized mechanism to detect ... label binding changes 
> for LDP ..."  Why?

	I agree. I have no idea why LDP bindings affect
the TE matrix.  This should probably be removed.

> 
> "Need for uniform measurement definitions across vendors and 
> operators" - that's the crux! that should be in the first 
> sentance of section 3 :)
> 
> " Need for higher order statistics... "  push it to the offline hosts
> 
> "Need for packet-sampled..."  "Need for offline bulk file 
> transfer..." The need needs to be better justified or removed.
> 

	I asked to have the offline stuff added. For the
collection of a traffic matrix, as you know, this 
entails the collection of a lot of data especially on
big core routers. It is preferable to use offline
bulk file transfer mechanisms such as NetFlow (being
standardized in IPFX now) or offline SNMP file transfers
of statistics snapshots.  This reduces the overhead of
the box having to service network management requests,
and also reduces the amount of network mgmt overhead
required to fetch the same amount of data if you 
collect your data as one big batch at some longer periodic
interval.

	--Tom


> 
> 
> 
> 
> 
> 
> 
> 
>