Here are the minutes from the NETCONF WG meeting at RIPE 59 in Seoul last month. We have to submit them for the IETF proceedings by Friday 2 April, 1700 EST, so if you find any errors, please tell me as soon as possible. The minutes are also on the NETCONF WG Web site: http://www.ops.ietf.org/netconf/59/minutes.html Regards, -- Simon.Title: Network Configuration WG (NETCONF) Meeting Minutes
OPS Area
Network Configuration WG (NETCONF)
Meeting Minutes IETF #59
Note takers: Sharon Chisholm, Chris Elliott, Simon Leinen
Chairs: Andy Bierman, Simon Leinen
Simon Leinen presented information about the Netconf effort at the January RIPE meeting, in the ``EOF'' (European Operators Forum) section. This presentation was an attempt to ensure the operators were involved in the development of Netconf.
Simon's presentation was reasonably well attended with a few people providing most of the feedback. The audience consisted primarily of the ISP, with little participation from enterprise network operators.
In discussions about operational practices in this venue, Simon didn't find anyone who was using elaborate tools for provisioning or configuration. Almost everyone used scripts and used Perl. The concern was raised that Netconf needs hooks into scripting languages such as Perl. Other concerns raised include the ability to deal with huge configurations and large access lists. There were also concerns about locking and overlapping operations. One specific feedback was on the present issue of multiple transports and the fact that we have not yet selected a mandatory substrate. After the meeting someone came up to Simon and said deciding for SSH as a [mandatory] substrate looked like a no-brainer, since this is what operators prefer.
Slides from Simon's presentation can be found on the RIPE47 Presentations Archive. An archive of the audio/video webcast is available from Sessions Archive.
Eliot Lear gave a brief presentation with a primary message being that Netconf has been cut down to focus on the basic features that need to be delivered ("Netconf got a haircut").
Eliot posed two questions to the operators: The first question was about their interest in managing services and devices that are behind NATS or firewalls, implying the issue about the direction in which the Netconf connection can be established. To his surprise, operators were not particularly interested in these issues.
The second question was the issue of which port to use for SSH - port 22 versus another port. Consensus among the operators was that we should use another port other than 22. They didn't care about access list issues, but rather just wanted another port.
There was discussion within the room that indicated that that trying to manage through a firewall was an issue that does come up in the field, as shows up as calls to Technical support (Chris Elliott). Tony Hain also noted that running SSH on a port other than 22 may cause problems in getting through NATs and Firewalls. It's possible that these issues aren't felt as strongly by the backbone-oriented NANOG audience.
Eliot's slides and an archived audio/video stream can be found on the NANOG30 pages.
Sharon Chisholm summarized a discussion that was held on Monday about data modeling issues. A mailing list for discussing data modeling for Netconf can be found on http://standards.nortelnetworks.com/netconf. They would like to have BOF at IETF60 in San Diego. Andy agrees to help with an Internet-Draft about "SMI" for netconf.
It was noted that the Data Modeling discussion is not just about the data models, but also includes other issues such as conformance to standards.
Rob Enns presented the changes in the latest version of the Netconf protocol draft.
It was asked why we want to remove the notifications from the protocol, since there is interest in this ability.
Rob Enns responded: The focus of Netconf in on configuration. Most of the operators we spoke to currently are using SNMP or syslog to get their configuration related notifications. Many of the proposed substrate solutions for Netconf can't easily support notifications. Thus, this feature did not meet a cost/benefit evaluation for inclusion in the initial version of the Netconf protocol. The one potential differentiator would be notifications that were specific to a particular Netconf session, as opposed to the device in general. Notifications could always be added at a later date.
Chris Lonvick notes that the Syslog WG will be doing work related to this area, so if the Netconf working group doesn't want to take this on, we could potentially do it in syslog.
Randy Presuhn noted that there are issues of the data model behind the notifications that are regardless of whether reliable syslog is bundled with netconf or any standalone notification protocol. Concerns along those lines should be brought to the data model mailing list that Sharon mentioned.
Andy agreed. The data model work can certainly look at notifications
The editors of the Netconf over SOAP Document were not present at the meeting.
Andy: Sections were added on which SOAP fields need to be supported. Must-understand attributes cannot be ignored. Only one, apparently. Details about using HTTP as a substrate...maybe others. Comments?
Eliot: The beep draft needs to be updated based on Rob's updates to the protocol specification. I didn't get a chance to do that before the IETF, but I'll do it with the next 2-4 weeks. I would be nice to receive comments on the beep draft between now and then.
It was noted that the SSH draft needs to be updated to remove notifications. Margaret will do the update once things settle down a bit.
The issues list is available on a web page (http://www.nextbeacon.com/netconf/).
A single host could act both as a manager and as an agent, but over a single Netconf connection it can only act as one.
It needs to be clarified that the syntax check is the minimum. Future work on the data model will handle requirements for referential integrity, both within an object and across objects.
Randy: I have a comment on second sub-bullet. I would request that the editors be careful with the wording as what Andy verbally communicated is not what is written in the slides.
Andy: I just meant a syntax check. We should just take out the second bullet
Randy: Just make sure the language in the RFC does not prevent people from doing a serious implementation
Andy: To me a syntax check means the same as XSD validation. I may not want to make it that strong of a requirement. The new revision is that "at maximum" be removed ...
take out second sub-bullet - only the "minimal" syntax check will be defined.
This is for SSH mapping only. There has been lots of discussion on the list lately. The concern is there should be a netconf specific framework for SSH, but it must be something that cannot appear in CDATA since there are security issues. Not sure if <?eom?> was the final answer. It was suggested as being a string.
Rob: I believe the design team agreed that this would be what was used ...
]]>]]> shall be used as an end-of-message marker.
Proposed resolution matches what Eliot said - we want to get new port assignment. We will tell vendors that it SHOULD be configurable, We are not going to do any work on how to configure this.
We originally said we would wait for experience before selecting a mandatory transport, but operators are saying using SSH. We propose to select SSH as the mandatory transport, giving more weight to the operators' preference than the implementors'. Note that the differences for implementations don't seem that large.
Margaret Wasserman: If the operators prefer SSH and we make it mandatory they will get it. If the vendors prefer something else they can also do that as well.
Eliot: With respect to 'should support beep', I think the agent SHOULD support beep, but I'm not sure if the manager SHOULD. This is too much to ask for simple, specific task tools.
Sharon: So you have two ends of a pipe and only one end speaks Beep?
Margaret: I agree with the agent should be MAY. If I write a real manager application .. it is going to have know the specific machines, so I will know if it supports BEEP.
Bert Wijnen: First it should be agent implementation MUST implement. If not enabled by the deployment, it is still there. I think mandatory SSH means that it needs to be on both sides. The goal is to ensure that every manager can talk to every agent.
Andy: Manager and agent MUST do SSH. Agent SHOULD do BEEP?
Randy: Look at RFC2119 and the appropriate use of MUST. If you didn't do it this way there would be no interoperability then it is a MUST. The SHOULD is supposed to be used if you don't support it, something may not go well. One need a really good reason for not doing it. I don't think we have fulfilled this for BEEP. I therefore argue it should be a MAY.
Andy: Note that we had discussed initiating the session from the device.
Randy: That requirement seems to be fading.
Randy: Manager and agents MAY implement BEEP or SOAP
Eliot: Even though I went to NANOG and reported back what I said, yeah the MUST can be SSH, but I think the firewall issue is going to come back to bite us. We will need the implementation of BEEP in the agent to get around this.
Andy: This is more relevant for large numbers of small devices than for small numbers of large devices.
Bert: When I read that the agent 'MAY', it sounds odd. It is usually used for something that you were not meant to do but you can.
Sharon: Is this going to be captured in the netconf protocol spec? Is there a more appropriate place so we don't have to rev the protocol spec when we realize that beep is a must?
Andy: Can be added to the application mapping doc. Will have to rev some document, though... Comments? Doesn't matter to andy as long as it's only in one place.
Background: There had been a proposal at the interim meeting to take out all operators and put the operations in data model instead.
Initial proposal: Leave the protocol set as is. We don't redo it in a more object oriented way.
Tim Stoddard: I spoke up on the issue. get-all was a dump and there were no filtering capabilities. At least get-state provides a filter. get-all could result in a huge amount of data on a large device.
Andy explains that a start point can be specified in your data model. There is a scoping mechanism for picking a subtree in the containment model. It is correct that there isn't a filter for just getting state. Operators want to be able to just get configuration, though.
Tim: The history of this was <get-state>. If this is <get-state> I have no issue. If it is <get-all> I have issue. <get-state> was better.
Andy: There were issues with get-state since it required the implementation to pull out state information. One doesn't want to make two separate requests to get operator and administrative status. The clincher for this decision was that the operator requirement was not to have a special command to get state, but rather to get configuration. I understand your concern that we don't have a command with filtering, but we are leaving that for future work. I don't think meta data belongs here.
Tim: Getting an interface list will be intensive. The lack of filtering will cause issues on large devices with many interfaces. Get with target seems to make sense.
Andy: It is not relevant to the protocol how the data model is constructed. Yes, get command is broken for large devices. Cisco had done some proprietary things to help here, but does not want to bring up here because of complexity. The ability to retrieve just names, for example.
Tim: The fact that those things are not on the table, that means that this is broken.
Andy: Broken is too strong. A large box with lots of interfaces should be able to stream back the data. This is an outlier case. Most of the time the box's resources match the sort of retrieval it is being asked to do. What do you recommend?
Tim: Get instances by name; limit size of the return. We need to make this more controllable. This should be discussed on the list.
Andy: Sense of the room?
Show of hands indicates some interest that something should be done here, with no-one objecting... look into it further.
Andy: Two really related proposals: limiting the number of nest levels or just returning instance names. The latter is probably cleaner as the level filtering might not always help.
Randy: The kind of data models vendors will develop will be influenced by the type of protocol we come up with. For example, logging related to resources - if you don't have control over what instances are returned you would want to ensure the logs are elsewhere so you don't bump into them.
Andy: We shall take it to the list. I know the Cisco developers did the work with just being able to return the instance names. We will capture that.
Retrieve instance names is the preferred choice?
Andy: The full blown solution would be to use XPath. It has been deemed too complex for now, but we may still end up going there (future work).
Rob took out the text about exec commands, but he intended to expand that section by saying that Netconf can support exec commands, but we are not putting them in at the moment.
Andy: issues with access control are not resolved. Access control for exec commands isn't necessarily same as config. Left for further work.
Tim Stoddard: If we are removing this at this time, is it planned to pick this up and to finish rollback so that it can be completed? Can we poll operators to find out? What is the effort to put it back? If we ask operators, we may find out this is important.
Andy: This is easier on a application development, this kind of feature - I want this blob of configuration to either all take or non of it to take. If it is a list of 10 commands and only 7 worked, not 10 from what I've heard from customers that this is not what they want.
Tim: What is the impact of not completing rollback now?
Andy: We've heard that if people don't use locking, things can get hosed. They can commit someone else's changes if they rolled back. They don't want to roll back all changes. My answer is that what is locking is for. I had originally said there would be specific constraints on rollback. Rob do you have ay comments?
Tim: We need rollback supported; it is an important feature.
Rob: I don't have any objection. We were concerned as to what rollback meant when you had a candidate configuration and when you didn't. Rather than dig it up, Phil suggested getting rid of it. If we think we can get it to work then we should take it to the mailing list.
Andy: Originally we were thinking of a full blown rollback, which was seen as too complicated. I think the concern is that there is a slight mismatch between the candidate and running.
Rob: I agree that operators like this a lot and it would be a good selling point for Netconf 1.0.
Andy: rollback on error is interesting for running config, is it interesting for candidate config?
Randy: yes.
Looking for commands from people familiar with RFC3553. Almost no one has read it.
Andy: In a tool like XML spy it will try to retrieve the schema like it is a URL (from a URI). This is convenient.
Rob: Juergen says we should just use URNs.
Scott Hollenbeck: RFC3688 describes registering XML parameters and may be more relevant here.
Eliot: I don't see any problems with this. There are issue with URNs - will the name survive forever? So long as they do I'm comfortable with this.
(Subsumed by previous discussion on 10.3.2)
This is the biggest part of the document that is still TBD
Andy: I had sent out an email that indicated that I would like a small set of constrain things as opposed to something open-ended. We can't check what a string is without standard semantics. We can't have any automated response to error codes that way and that is the end goal.
Andy: We need layered errors: protocol and application or data model-specific error. We also need to be able to include multiple error indications in one reply if more than one error. How do I associate this chunk of configuration error with this particular RPC error is another issue. The document suggests using edit-path. I don't think there is an example. It needs to provides a hint or you could just reproduce the containment hierarchy and stuff the error in where it goes.
Consensus of the room: To clarify the document with respect to error codes
Example: an <rpc-error> for a failed locking operation could include information about the current lock holder (e.g. user, session).
No comments on proposal - text needed!
Andy had previously asked the design team to come up with a proposal.
Andy had been looking at Alarm MIB for IanaItuProbableCause and wonders why we would want to reinvent every error code that could happen when we could just leverage this?
Gregory Lebovitz: There was some work done in the OPSEC WG on standardizing logging of security. Based on that experience, if you try to do it in this document you will drown. There is too much to do.
Andy: No... this is just about error replies for a specific request.
Randy: Following up on that, I think there are three kinds of error codes. The first level is at the level of the RPC error itself - the netconf protocol level - that will be a very limited set of errors. The second level will be an elaboration that is specific to the data model which will be specified as part of the data model. The value of this is out of scope of the protocol work. The third level is an implementation specific problem that has been noted - during validation an referential constraint was violated - this would be valuable to communicate to the operator.
Andy: That is what I was proposing - in the protocol putting in the container. I am then just throwing out the idea that this IANA probable cause is worth putting out there. I agree it could balloon.
Randy: In my view, the IANA probable cause might be useful for data model or application level. We don't need to nail them down in this working group. We nail down the RPC level and probable cause may not be the best choice for those.
We should stick to defining some core error codes for protocol errors, with a place holder for the rest.
There were no objections to the operation set.
Andy: The text that says you can't mix different values in an operations request sounds like a new CLR (Consistency Language Rule aka "Crappy Little Rule") - a rule that either adds no value or gets in the way, but that we are stuck with for historical reasons. I think we will probable regret that later.
Sharon: Then we should either remove it or make the restriction discoverable via a #capability
Andy doesn't want more capabilities. And maybe some parts of the data model may support mixed operations, while other parts don't - they may be possible when configuring interfaces, but not when configuring BGP, for example.
Rob: If we do take this up then we need to look at all the evil things that people can do if they mix things up and come up with CLR to handle those. Someone is going to write a conformance tool for example. Agrees that we are probably going to be more restrictive.
Andy: If taking it out means the agent MUST support arbitrary combinations then I don't want it. We need to wordsmith something.
Randy: We need to consider management applications, which need a way to predict agent behavior. Permitting the agent to beg off on these things with an unspecific "complexity violation" error isn't good. Without any capability and therefore the ability to know in advance, the prudent management application is not going to take advantage of this feature, and will instead always break things down into the specific requests since that will always work.
Andy: We need to try to come up with some better text and propose it on the mailing list. Perhaps for the same instance information it must be the same operation. This becomes important with respect to rollback.
This needs more discussion on the list.
Locking applies to all access mechanisms (SNMP, CLI, NETCONF etc.)
Implementations may choose to hide locking from CLI users, but the CLI sub-systems must still use the locking mechanism
The discard-changes parameter added to the lock operation should be removed. The agent will then always discards changes to the candidate if they are abandoned by the session.
No objections.
Eliot Lear notes that he still owes Rob Enns some text on the <lock> security issue for the security considerations section.
The originally planned second slot for the Netconf WG on Thursday is canceled.
The goal is to wrap the current set of working group documents up and issue Last Calls for publication as Proposed Standard around the time of the next meeting.
Andy would like to see implementations get started and to plan for a Netconf bakeoff event - hopefully in 3-6 months for the fundamental features.