Dear Mr. Farrel,
Thank you for the Response to the Q14/15 Liaison about the CCAMP
Crankback Draft. We appreciate the opportunity to provide further input to the work. Q14/15 will address
the LS in its upcoming meeting.
Regards, Kam Lam, Q14/15
Rapporteur
To: Mr. Kam Lam, Rapporteur Q14/15 From:
Adrian Farrel and Kireeti Kompella, IETF CCAMP co-chairs
Cc: Alex Zinin and Bill Fenner, IETF Routing
Area Directors
Scott Bradner, IETF liaison
to ITU-T Subject: Crankback in GMPLS Systems For:
Information
Dear Kam,
Thank you for your liaison
concerning draft-ietf-ccamp-crankback-03. It is useful to have additional
review input from a wide audience. Please convey our special thanks to
Stephen Shew and Marco Carugi for their detailed review of the draft in
Geneva.
We would like to urge Q14/15 to continue to consider this draft
as further work is carried out on crankback within the context of
G.7713.
In response to the specific points that were raised in the
liaison...
> 1. Semantics of the
term "node". Due to the GMPLS principle of > maintaining
separation of control and transport (data/bearer) planes, > there are
two meanings for the term "node". First, an instance of a >
signalling protocol (and/or routing protocol) that has some transport >
resources in its scope. Second, a transport plane resource such as
a > cross connect. Using the first meaning, a node is not the
context for > the interface identifiers that are passed in crankback
TLVs. > Throughout the document the particular meaning can be
determined > by the context of the term. Examples
are: > > - Section 5.2, the sentence "Otherwise, multiple nodes
might attempt to > repair the LSP." means the control functions of
signalling and routing. > > - Section 7.1 "As described above,
full crankback information SHOULD > indicate the node, link and other
resources, which have been attempted." > refers to the transport
resource.
It is correct to observe that historically there has been
poor separation of controllers and transport devices within GMPLS, with
much of this issue arising from the historic collocation of controllers and
data switches in MPLS networks. This persists because of the (eminently
sensible) tendency to optimize for the majority case.
However, in
the case of crankback, and specifically in the case of this draft, the
emphasis in providing 'full crankback information' is on the addresses of
transport links and nodes and not controllers. We will revisit the draft to
ensure that where control plane function is implied, the "node" that takes
action is clearly identified as the control plane node.
> There
are some occasions where the use of the term appear to be > ambiguous
and clarity would be appreciated. In particular TLV > types 10 and
32. If type 10 represents a routing and signalling > function,
then what TLV describes the "transport plane node" > (e.g., cross
connect or Network Element)? If type 32 means > "transport plane
nodes", then a different TLV may be needed > to identify the
"routing/signalling nodes" that have already > participated in crankback
attempts. > Having a clearer distinction between control plane
functions > and transport plane resources would be helpful.
As
indicated above, the intention of crankback is to apply a process to the
path determination for an LSP. The path is determined using transport plane
links and nodes, and although there may be some interesting aggregation
available by converting this information to control plane nodes, the
conversion is not necessarily simple. Thus, these TLVs all refer to
transport plane quantities, and we will make this clearer in
the draft.
Again, of course, in the majority case we can make
considerable optimizations by knowing that control plane and transport
plane "nodes" are related in a 1:1 ratio and are usually
collocated.
> 2. When crankback
information is received at a "routing/signalling > node", can it be used
by the routing path computation function for other > LSP requests than
the LSP whose signalling caused the crankback action?
It is generally
out-of-scope for the IETF to dictate how individual implementations
operate. It is quite conceivable that such an action would be taken, but it
is also clear that there is a potentially dangerous interaction with the TE
flooding process (i.e. the IGP). Thus we would say that the crankback
information MAY be used to inform other path computations.
We would
want to be very cautious that crankback is not intended to supplement or
replace the normal operation of the TE flooding mechanism provided by the
TE extensions to the IGP except for the establishment of a single LSP. If
the IGP is found to be deficient as a flooding mechanism we would expect to
look first at ways to address the problems through IGP extensions before
utilizing a signaling mechanism.
We will look at how to add some of
this information to the draft.
>
3. Section 6.1 "Segment-based Re-routing"
option. It is not clear > what this means. Can multiple
"routing/signalling nodes" perform > crankback on the same LSP at the
same time if this flag is set?
Since the intention is to establish only
one LSP, there must be only one active sequence of LSP setup messages
(RSVP-TE Path messages) at any time. Thus only one LSR may attempt
re-routing at any one time.
If you consider the processes by which Path
messages are attempted and crankback information is returned on PathErr
messages, this will be clear. That is, when an PSR receives a crankback
PathErr, it may attempt to re-route or it may forward the PathErr back
upstream.
It might help if we reworded the draft to say "Any node may
attempt rerouting after it receives an error report and before it passes
the error report further upstream."
>
4. Section 4.3 History persistence.
If a repair point (a > "routing/signalling node") is unsuccessful in a
crankback attempt, is it > possible for it to be not involved when
another repair point (e.g., > closer to the source) succeeds in a
crankback attempt. If so, how > does the first repair point know
to clear its history?
Note that the purpose of the history table as
described in section 4.3 is to correlate information when repeated retry
attempts are made by the same LSR. Suppose an attempt is made to route from
A through B, and the signalling controller for B returns a failure with
crankback information. An attempt may be made to route from A through C,
and this may also fail with the return of crankback information. The next
attempt SHOULD NOT be to route from A through B, and this is achieved by
use of the history table.
The history table can be discarded by the
signaling controller for A if the LSP is successfully established through
A. The history table MAY be retained after the signaling controller for A
sends an error upstream, however it is questionable what value this
provides since a future retry as a result of crankback rerouting should not
attempt to route through A (such is the nature of crankback). If the
history information is retained for a longer period it SHOULD be discarded
after a local timeout has expired, and that timer MUST be shorter than the
timer used by the ingress to re-attempt a failed service (note that
re-attempting a failed service is not the same as making a re-route attempt
after failure).
As mentioned for point 2, the crankback information MAY
be used to enhance future routing attempts for any LSP, but this is not
what section 4.3 is describing.
We will try to clarify this in the
draft.
> 5. Section 4.5
Retries. Some guidance on setting the number of > retries may be
helpful as this is a distributed parameter. Is it set to > be the
same value at all points that can perform crankback within one >
network?
The view of CCAMP at the moment is that although it is
technically possible to allow the number of retries to be set for each LSP,
this probably represents too much configuration and too fine a level
of control. It seems likely that initial deployments will wish to set
the number of retries per node through a network-wide configuration
constant (that is, all LSRs capable of retrying will apply the same count)
with the possibility of configuring specific LSRs to have greater or lower
counts. Note that configuring an LSR not to be able to perform retries
is equivalent to configuring the retry count to be zero for that
LSR.
It is also probable that initial deployments will significantly
restrict the number of LSRs within the network that can perform
crankback rerouting. This would probably be limited to "boundary"
nodes.
In the event that implementations and deployments wish to
control the number of retries on a per LSP basis, we would revisit the
signaling specification and add the relevant information to the Path and
PathErr messages.
The actual value to set for a retry threshold is
entirely a deployment issue. It will be constrained by the topology and
nature of the network. It would be inappropriate to suggest a figure in
this draft since there are no hard and fast rules.
In review of
section 4.5 of the draft, we see that there is some old text describing
more flexibility in the control of retries than we intend to provide. Thank
you for drawing our attention to this; we will clean it up.
Thank
you once again for your feedback on this draft. If you have further
comments, we would certainly like to hear them. The easiest way for
individuals to contribute to the discussion of this topic is by sending
mail to the CCAMP mailing list. Details of how to subscribe to this list
can be found at http://www.ietf.org/html.charters/ccamp-charter.htmlYours sincerely, Adrian Farrel and Kireeti
Kompella
|