[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Last call review of draft-ietf-ccamp-gmpls-recovery-e2e-signaling-02.txt



Hi,
I've got a couple of comments in what seems like quite a long draft.
Adrian
 
Abstract:
   This document describes protocol specific procedures and extensions
   for Generalized Multi-Protocol Label Switching (GMPLS) Resource
   ReserVation Protocol - Traffic Engineering (RSVP-TE) signaling to
   support end-to-end Label Switched Path (LSP) recovery that denotes
   protection and restoration.
Not sure what is meant by "denotes" in this context.
Perhaps "is used to provide" ?
===
Section 2
OLD
   In addition, the reader is assumed to be familiar with the
   terminology used in [RFC3945], [RFC3471], [RFC3473] and referenced
   as well as [TERM] and [FUNCT].
NEW
   In addition, the reader is assumed to be familiar with the
   terminology used in [RFC3945], [RFC3471], [RFC3473],
   as well as in [TERM] and [FUNCT].
===
Section 2
   Checklog List from revision v01.txt: 
Please either remove this text or mark it so that the RFC editor will remove it.
===
Section 3
Although the second paragraph defines "end-to-end protection" I would like to see this pulled out into its own paragraph for emphasis, and also a little more clarity added to the definition. For example, it would appear that your four types of e2e protection are all have the protecting and protected LSPs disjoint in some way - this makes it appear that this is a property of e2e protection in general and you should state this if it is true.
===
Section 3 para 5
   In 1:N (N =< 1) protection with extra-traffic,
Hopefully you mean N >= 1 
===
Section 3 para 5
I note that you do not distinguish between 1:N and 1:N-with-extra-traffic.
If there is a reason for this perhaps you could add a note to the text.
===
Section 3
OLD
   working one. Here, the recovery resources for the protecting LSP are
   pre-reserved and explicit action is required to activate (i.e.
NEW
   working one. Here, the recovery resources for the protecting LSP are
   pre-reserved but explicit action is required to activate (i.e. 
                ^^^
===
Section 3
OLD
   requirements by allowing multiple protecting LSPs to share common
   link and node resources. The recovery resources are pre-reserved and
NEW
   requirements by allowing multiple protecting LSPs to share common
   link and node resources. The recovery resources are pre-reserved but
                                                                    ^^^
===
Section 3
   Note that in both
   cases, any lower priority LSP that would use the pre-reserved
   resources for the protecting LSP(s) MUST be preempted during the
   activation of the protecting LSP.
This sentence comes out of the blue. The whole of the paragraph up to then has not even mentioned extra traffic. I suggest you insert a paragraph break and a sentence explaining how the pre-reserved resources may be used to support an extra-traffic LSP...
Also, delete "would".
===
Section 3
   Full LSP re-routing (or restoration) switches normal traffic to an
   alternate LSP that is fully established only after working LSP
   failure occurs.
This text does not read well in English. In fact, the same is true of pre-planned. What you mean is, I think, as follows...
   Full LSP re-routing (or restoration) switches normal traffic to an
   alternate LSP that is not even partially established until after
   the working LSP failure occurs.
===
Section 3
   Note that crankback signaling (see [CRANK]) and LSP segment recovery
   are further detailed in dedicated companion documents. Also, there
Need to add a citation for LSP segment recovery.
===
Section 4.2
OLD
   The recovery attributes includes all the parameters that determine 
NEW
   The recovery attributes include all the parameters that determine
===
Section 4.2.1
   - S (Secondary) bit: enables distinction between primary and 
     secondary LSPs. A primary LSP is a fully established LSP for 
     which the resource allocation has been committed at the data plane 
     (i.e. full cross-connection has been performed). Both working and 
     protecting LSPs can be primary LSPs. A secondary LSP is an LSP 
I am uneasy about this definition of "primary". In [TERM] the only mention of "primary" is in section 2...
   Recovery typically involves the activation of a recovery (or
   alternate) LSP when a failure is encountered in the working (or
   primary) LSP.
This implies that "primary" is a synonym for "working".
Further, RFC3471 has a subtly different meaning for "secondary" in section 7.
   Protection Information also indicates if the LSP is a primary or
   secondary LSP.  A secondary LSP is a backup to a primary LSP.  The
   resources of a secondary LSP are not used until the primary LSP
   fails.  The resources allocated for a secondary LSP MAY be used by
   other LSPs until the primary LSP fails over to the secondary LSP.  At
   that point, any LSP that is using the resources for the secondary LSP
   MUST be preempted.
Can we please not modify the interpretation of the S-bit.
If you need to flag a new piece of information (to distinguish between resource allocated and not) then please introduce a new flag.
Note that the P-bit appears to be slightly orthogonal because the text seems to describe the *current* role of the LSP. (The S-bit in RFC3471 describes the role at the time the LSP is set up, I think).
===
Section 4.3
OLD
   When used for the working LSP signaling, the Association ID of the
NEW
   When used for signaling the working LSP, the Association ID of the
===
Section 4.3
OLD
   When used for the protecting LSP signaling, this field identifies
NEW
   When used for signaling the protecting LSP, this field identifies
===
Section 5
   When a failure occurs (say at node B) and is detected at end-node D,
   the receiver at D selects the normal traffic from the other LSP.
   From this perspective, 1+1 unidirectional protection can be seen as
   an uncoordinated protection switching mechanism acting independently
   at both end-points. Also, for the protected LSP under failure
   condition, the Path_State_Removed Flag of the ERROR_SPEC object (see
   [RFC3473]) SHOULD NOT be set upon PathErr message generation.
So, what you are saying is that in 1+1 protection the network may *never* know that the error is so bad that the LSP is dead, but MUST leave that choice to the ingress. While this is the operational practice in many transport networks, I don't see why you make this as strong as a SHOULD NOT.
===
Section 5
   Note: one should assume that both paths are SRLG disjoint otherwise,
   a failure would impact both working and protecting LSPs.
What is this supposed to tell the reader? That he should make the assumption or that he should ensure SRLG diversity? ;-)
Actually, I think you want to say that the quality of 1+1 protection may vary. Allowing link diverse, node diverse or SRLG diverse 1+1 protection.
(ditto section 6 and 7)
===
Section 5.1
   Since both LSPs belong to the same session, the SESSION object MUST
   be the same for both LSPs.
An undisputable conclusion drawn from an unproven premise.
Why must both LSPs belong to the same session? A one line explanation would start the section off nicely.
===
Section 5.1
   A new PROTECTION object is included in the Path message. This object
What is the implication of "new"? I guess you mean the new type defined in this draft.
===
Section 5.1
   A new PROTECTION object is included in the Path message. This object
   carries the desired end-to-end LSP Protection Type (in this case,
   "1+1 Unidirectional"). This LSP Protection Type value is applicable
   to both uni- and bi-directional LSPs.
This is unclear. In section 14.1 you have
                0x08    1+1 Unidirectional Protection 
                0x10    1+1 Bi-directional Protection
===
Section 5.1
Your description of the use of the P-bit for 1+1 protection isn't clear. You mean to say that the P-bit indicates which LSP the ingress would *prefer* to be the protecting LSP if all other things are equal, but your text (and the description of the P-bit in sections 4.2.1 and 14) don't make this clear.
===
Section 6.2
   directions. This is done using the Notify message with a new Error 
   Code indicating "Working LSP Failure (Switchover Request)". The
I don't see this in the IANA section, and I wonder if you also mean Error Value?
===
Section 6.2
   directions. This is done using the Notify message with a new Error
   Code indicating "Working LSP Failure (Switchover Request)". The
   Notify Ack message MUST be sent to confirm the reception of the
   Notify message (see [RFC3473], Section 4.3).   
I see no definition of a "Notify Ack message" in RFC3473 (in any section).
I am worried that you are confusing the Ack message with a new procedure requiring a handshake of Notify messages.
===
Section 6.2
        1. If an end-node (A or D) detects the failure of the working 
           LSP (or a degradation of signal quality over the working 
           LSP) or receives a Notify message including its SESSION 
           object within the <upstream/downstream session list> (see 
           [RFC3473]), it MUST begin receiving on the protecting LSP 
Note that the sender descriptor or flow descriptor is also present in the Notify and this will considerably help resolve ambiguities and race conditions since it identifies the LSP.
===
Section 6.2
        1. If an end-node (A or D) detects the failure of the working 
           LSP (or a degradation of signal quality over the working 
           LSP) or receives a Notify message including its SESSION 
           object within the <upstream/downstream session list> (see 
           [RFC3473]), it MUST begin receiving on the protecting LSP 
I don't think the receipt of a Notify message is sufficient, per se. I think the error code and value need to indicate a problem with the LSP.
===
Section 6.2
        1. If an end-node (A or D) detects the failure of the working 
           LSP (or a degradation of signal quality over the working 
           LSP) or receives a Notify message including its SESSION 
           object within the <upstream/downstream session list> (see 
           [RFC3473]), it MUST begin receiving on the protecting LSP 
           and send a Notify message reliably to the other end-node (D 
           or A, respectively).
"...send a Notify message reliably" will certainly be misunderstood.
You presumably mean "...send a Notify message including the Message_ID object".
===
Section 6.2
        2. Upon receipt of the switchover message, the end-node 
           (D or A, respectively) MUST begin receiving from the 
           protection LSP and send a (Notify) Ack message to the other 
           end-node (A or D, respectively) using reliable message 
           delivery (see [RFC2961]).
While this clarifies the use of Ack rather than Notify Ack (not sure why you need to include "(Notify)") it is now confused about the delivery of the Ack message. How do we achieve reliable delivery of an Ack message?!
===
Section 7
   Although the resources for the protecting LSP are pre-allocated,
   preemptable traffic may be carried end-to-end using this LSP (i.e.
   the protecting LSP is capable of carrying extra-traffic) with the
   caveat that this traffic will be preempted if the working LSP fails.
Do you mean that the extra traffic is carried "using this LSP" or "using some or all of the resources assigned to this LSP"?
===
Section 7
   Also, if extra-traffic is carried over the protecting LSP, the
   corresponding end-nodes may be notified of the failure in order to
   complete the switchover. 
I think this is "end-nodes may need to be notified"
===
Section 7.2
   To co-ordinate the switchover between end-points, an end-to-end
   switchover request is needed such that the affected LSP(s) are moved
   to the protecting LSP.
In what way may there be more than one affected LSP moved to a single protecting LSP?
===
Section 7.2
   This operation may be done using a Notify message exchange with a
   new Error Code indicating "(Working) LSP Failure (Switchover
   Request)". The Notify Ack message MUST be sent to confirm the
   reception of the Notify message. 
All of the same comments as for section 6.2.
Also:
- Why do you say "may be done"?
- Is this the same error code as in 6.2? (the text is slightly different)
=== 
Section 7.3
OLD
   provisioned protecting LSP is resource-disjoint LSP from the N
NEW
   provisioned protecting LSP is resource-disjoint from the N
===
Section 7.3
Can you highlight that the N working LSPs are all between the same pair of end points.
===
Section 8
OLD
   this does not mean that the corresponding resources can not used by
NEW
   this does not mean that the corresponding resources can not be used by
===
Section 8
   To make bandwidth pre-reserved for a protecting (but not activated)
   LSP, available for extra traffic this bandwidth could be included in
   the advertised Unreserved Bandwidth at priority lower (means
   numerically higher) than the Setup Priority of the protecting LSP.
This feels like it should be the Holding Priority. That is, the Setup Priority was only important for how it could displace pre-existing LSPs.
===
Section 8.3
OLD
   From [GMPLS-ARCH], the secondary LSP is setup with resource pre-
NEW
   From [RFC3945], the secondary LSP is setup with resource pre-
===
Section 9
OLD
   plane) a specific protecting LSP instantiated during the (pre-
   )provisioning phase. This requires restoration signaling along the 
NEW
   plane) a specific protecting LSP instantiated during the (pre-)
   provisioning phase. This requires restoration signaling along the
===
Section 9
   resource sharing), the LSPs must have the same Session Ids, but the
   Session Id includes the target (egress) IP address. These addresses
2xs/Id/ID/
Suggest a search for "id"
===
Section 9.3
OLD
   From [GMPLS-ARCH], the secondary LSP is setup with resource pre-
NEW
   From [RFC3945], the secondary LSP is setup with resource pre-
===
Section 10
OLD
   activated. Additional condition raises from mis-connection avoidance
NEW
   activated. An additional condition arises from mis-connection avoidance
===
Section 10
OLD
   Note that step 1 may cause alarms to be raised for the pre-empted
   LSP. If alarm suppression is desired the pre-empting node MAY expand
   before applying step 1 act as follows.
NEW
   Note that step 1 may cause alarms to be raised for the pre-empted
   LSP. If alarm suppression is desired the pre-empting node MAY insert
   the following steps before step 1.
===
Section 10
   At the downstream node (with respect to the pre-empting LSP) the
   processing is RECOMMENDED to be as follows:
   
   1. Receive PathTear (and/or PathErr) message for the pre-empted 
      LSP(s).
   
   2a.Release the resources associated with the LSP on the interface
      to the pre-empting LSP, remove any cross-connection and release 
      all other resources associated with the pre-empted LSP.
   2b.Forward the PathTear (and/or PathErr) message per [RFC 3473].
   
   C. Receive the Path message for the pre-empting LSP and process as 
      normal, forwarding it to the downstream node.
   
   D. Receive the Resv for the pre-empting LSP and process as normal,
      forwarding it to the upstream node.
Cool numbering scheme :-)
Any chance of settling on something more conventional?
===
Section 11.2
   Note: when the end-to-end LSP Protection Type is set to
   "Unprotected", both S and P bit MUST be set to 0 and the LSP SHOULD
   NOT be re-routed at the head-end node after failure occurrence. The
   Association_ID value MUST be set to the LSP_ID value of the signaled
   LSP.
Please explain the difference between an attempt to "re-route" and an attempt to "re-establish". presumably it could involve:
- a time difference
- the use of make-before-break for failed LSPs.
- the use of the ASSOCIATION object.
I would like to make sure that you are not applying "SHOULD NOT" to LSP re-establishment.
===
Section 12
OLD
   allocated to the LSP that was originally routed over it even after a
NEW
   allocated to the LSP that was originally routed over them even after a
===
Section 12
   - then, apply the reverse 1-phase APS switchover request/response 
     (or 2-phase APS) described in Section 6.2 (or Section 7.2, 
This is the first mention of APS
===
Section 13
I think this section is going to give us grief during IESG review :-(
Why do we need to tie this so closely with NMS etc. And why describe it as external?
Can't we simply describe the function by:
- dropping the first para
- in C, D and E drop "externally"
- in D and E replace "manual" with "requested"
===
Section 13 TWICE
OLD
   Recovery signaling operation is initiated externally that switches
NEW
   Recovery signaling is initiated externally that switches
===
Section 13 (A and B)
   is set to either 0x04, or 0x08 or 0x10.
I would prefer you to use the meanings rather than the values.
===
Section 13 (D and E)
   This, unless a fault condition exists on
? "This is allowed"? "This is possible"? "This is successful"?
===
Section 14
OLD
   use so that the object can be included in the Notify message to act
   a switchover request for 1+1 bi-directional and 1:1 protection. 
NEW
   use so that the object can be included in the Notify message to act
   as a switchover request for 1+1 bi-directional and 1:1 protection. 
===
Section 14.1
I believe we have had this discussion before.
We don't introduce reserved fields for future extensibility. We only do it for padding.
If you are certain that we need to extend in the future then please use sub-objects or TLVs.
This means that you can:
a. Remove the last four bytes of the Protection object.
b. Retain the C-Type from RFC3473
===
Section 15
   This object MUST be
   present in the Path message (for the pre-provisioning of the
   secondary protecting LSP) if and only if the LSP Protection Type
   value is set to "0x02".
"MUST if and only if" is not really in RFC2119.
Can we two statements. One with "MUST" and one with "MUST NOT".
===
Section 15
In the case where my protecting LSP protects only one working LSP and where the full path of the protecting LSP is known by the ingress (strict and explicit) and there is no resource sharing between the protected and protecting LSP, I can't see why I must include a PPRO.
In other words, PPRO is an enabler of function (as stated in section 15.4 "The PPRO enables of sharing recovery resources between a given secondary protecting LSP and one or more secondary protecting LSPs if their corresponding primary working LSPs have mutually (link/node/SRLG)disjoint paths."), but that does not make its presence mandatory.
===
Section 15.1
   The contents of a PRIMARY_PATH_ROUTE object are a series of
   variable-length data items called subobjects. The subobjects are
   identical to those that can constitute an EXPLICIT/RECORD ROUTE
   object as defined in [RFC3209], [RFC3473] and [RFC3477].
This seems in contradiction with section 15.3
===
Section 15.4
OLD
   The PPRO enables of sharing recovery resources between a given
NEW
   The PPRO enables sharing of recovery resources between a given
===
Section 16
   The ASSOCIATION object is used to associate LSPs with each other. In
   the context of end-to-end LSP recovery, the association MUST only
   identify LSPs that support the same Tunnel ID.
Hmmm. presumably same source and destination is relatively important too.
===
Section 16
   The ASSOCIATION object is used to associate LSPs with each other.  
You already said this.
===
Section 16.1
      Association ID: 16 bits
   
        A value that when combined with Association Type and 
        Association Source uniquely identifies an association. 
It would be helpful to state who assigns this value.
===
Section 16.1
      Association Source: 4 or 16 bytes
   
        The IP address of the node that originated the association.
"The IP address"?
Question. Are two associations with the same Association ID equivalent if the Association Source addresses are different but identify the same node?
Answer (it transpires) is "no".
You need to make this much clearer here.
===
Section 17
Isn't Notify modified as well?
And I thought Resv was, but I may have been sleeping.
===
Section 18
This is a bit poor.
If you don't modify the "external commands" section, you'll certainly have to discuss security for them. After all, a forced failover can be pretty disruptive.
But I think you need to discuss misconnection here. In particular when there is mesh protection going on.
===
Section 19
The IANA section needs some gardening to make it really easy for IANA to implement.
- Break it up into clearer subsections.
- Make sure you have included all of the information needed in the registry
- Point back at the defining sections of the draft
- Only have suggested values in one place in the document
- Be consistend in using TBD or TBA in the document
===
Section 19
Should the IANA section also cover the bits in the ADMIN STATUS object?
===
Section 21
Missing references [CRANK], [RFC2205]. Suspect you need to check them all.
Will need to add a reference for LSP segment recovery.
===
Section 21.1
This seems a very long list of normative references. I hope you can split this so that most of the references are informational.
===
Section 22
You might change this to "Editors' Addresses"