[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Fwd: Re: [RMONMIB] I-D ACTION:draft-ietf-rmonmib-raqmon-pdu- 08.txt]



At 02:09 PM 1/14/2005, David B Harrington wrote:
> 
>
>> -----Original Message-----
>> <soapbox>
>> Why does it seem like every couple years the RMON WG pushes
>> the SNMP envelope, and keeps running into "CLR roadblocks"?
>> The standards are supposed to serve users, not the other
>> way around.  It seems to me that any effort spent
>> devising detailed rules around SMI usage (to prevent users
>> from "hurting themselves") is totally pointless, especially
>> in the absence of any real evidence of a problem to solve.
>> Here's a litmus test: What operational problems are being
>> solved by preventing somebody from defining a table of
>> accessible-for-notify objects?  Can't think of any?  Then 
>> lose the CLR!
>> </soapbox>
>
>Overall, I really hate discussion about CLRs atop CLRs atop CLRs. I
>feel like we're navel gazing rather than doing something productive to
>make SNMP a more useable protocol for operators.
>
>However...
>
>I have a concern over whether tables full of accessible-for-notify
>objects obscures the fundamental trap-directed-polling philosophy of
>SNMP.
>I think doing this is bad practice.

I don't think so.  I also don't agree that RAQMON is somehow
a signalling protocol.  It is a reporting mechanism, not
signalling.  If a network device has management information
to report, and is willing to live with the "best effort"
nature of SNMP notifications, then why should the standard
preclude this usage?  Allowing for manager polling
means the agent has to store all this data to be retrieved
arbitrarily by managers.  This is in direct conflict
with the goals of RAQMON -- lots of simple devices that
report into a collector.  The collector has the resources
to be a complete SNMP agent.  Polling a collector is scalable.
Polling 10,000 end-devices is not.

Personally, I wouldn't mind dropping the SNMP notification
transport of RAQMON and just keep the TCP encoding.  The SNMP
approach is quite unattractive in comparison, and so far,
every vendor interested in RAQMON has expressed a total
lack of interest in putting SNMP notification sender support
in their products.

Andy


>I do not like what the RAQMON MIB does; it should send a simple single
>notification to the manager saying it has some information for the
>manager, and then let the manager poll for the rest of the data. Dan's
>argument is that the devices are very limited and sending the
>notification is simple; Marshall's Simple Book seems to disagree that
>an event-driven approach is simple. The reason SNMP is used for RAQMON
>is because it is already on the device. Well, if it's on the device
>already, it probably supports polling already, so using the polling
>approach should not be detrimental. If the goal is real-time reporting
>of events, I don't feel comfortable that using SNMP this way is a wise
>choice.
>
>IN a RAQMON system of many IP phones, all sending large notifications
>to a collector, will the collector be able to keep up? How many phones
>can one collector handle before becoming swamped? If trap-directed
>polling were used, the collector would only need to process a simple
>trap and to queue up the request to poll for more information; it can
>choose its timing rather than constantly being forced to stop
>everything to handle the interrupt. With traps, the OS pre-emptively
>takes control from the application; with trap-directed polling, the
>application retains better control over the context switching. SNMP is
>not really well-designed for real-time event-driven management; a
>stream based session-based protocol like LFAP (in the IPFIX WG) would
>seem a much better approach.
>
>Dan tells me that Bert, Steve Waldbusser and Andy all have accepted
>this approach for RAQMON. So be it. I don't care enough about the
>RAQMON case to go to the RMON WG and challenge it.
>
>But until RAQMON becomes widely deployed in real-world networks, with
>real-world applications handling this load, I would not like to change
>the guidelines to recommend, or even imply a recommendation, for such
>an approach. Real world experience argues that this approach may not
>be scalable.
>
>Adding text to the guidelines saying "this is how to build tables of
>accessible-for-notify objects" implies this is acceptable practice.
>If this is ever published as a BCP, that implies it is a BEST current
>practice.
>I really feel uncomfortable with anything that encourages this
>practice.
>I would prefer to not make such a change at all, and to generally
>discourage the practice.
>Part of my reticence is experience with Spectrum, a full-blown
>platform normally capable of managing tens of thousands of SNMP
>agents, where one customer decided to use SNMP traps for event-driven
>management and totally overwhelmed the application with notifications.
>
>Rate limiting ala RFC3413 might have prevented the problem to
>Spectrum, if only one device needed to be rate limited. But the
>problem wasn't one agent sending to many traps. Spectrum's customer
>configured the network to send lots of traps to Spectrum, from
>multiple agents they designed themselves with large varbind lists.
>
>Each notification interrupted Spectrum processing, as expected. Each
>trap caused the creation of a thread to process the trap.
>This worked fine in a normal SNMP environment with tens of thousands
>of devices sending small notifications to direct polling activities.
>The problem is that the agents, not being aware of the impact they
>would have on the network and the application, together sent hundreds
>of traps per second, trying to report events in real-time. Traps came
>in so fast and each trap required so much processing time to handle
>the large list of varbinds that the threads kept being interrupted;
>the system was so bogged down responding to interrupts and creating
>new threads for new traps it never had time to actually finish
>processing the traps already received. Ultimately it ran out of thread
>space and stopped creating new threads, but still could never get back
>to processing the already-received traps because it was constantly
>being interrupted. 
>
>What was needed was to educate the customer that SNMP is not designed
>to be used that way, and to have them use trap-directed polling
>instead. This solved the problem.
>
>Maybe this is not really a problem any longer, but the fact that
>Marshall discusses this is his book makes me believe that SNMP was
>designed to use trap-directed polling for a very good reason.
>
>We should recognize that SNMP was designed to use trap-directed
>polling, and changing SNMP to be event-driven could be a serious
>design issue. If the majority of MIB Doctors, especially those with
>manager-side experience and not just agent-side experience, believe
>this is not a problem and event-driven management using tables of
>notifications is scalable, then I'll shut up. But I think it is
>important to discuss real world experience with this approach rather
>than remaining quiet just so the RMON WG won't feel we're constructing
>CLR roadblocks.
>
>David Harrington
>dbharrington@comcast.net