[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Last call review of draft-ietf-ccamp-gmpls-recovery-e2e-signaling-02.txt

To: Adrian Farrel <adrian@olddog.co.uk>
Subject: Re: Last call review of draft-ietf-ccamp-gmpls-recovery-e2e-signaling-02.txt
From: dimitri papadimitriou <dpapadimitriou@psg.com>
Date: Thu, 17 Mar 2005 15:35:34 +0100
Cc: ccamp@ops.ietf.org
In-reply-to: <01f501c529c0$198d56c0$dccb2bd4@Puppy>
References: <01f501c529c0$198d56c0$dccb2bd4@Puppy>
Reply-to: dpapadimitriou@psg.com, dimitri.papadimitriou@alcatel.be
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.5) Gecko/20041217

hi adrian

thanks for commenting - in-line responses to your question

Adrian Farrel wrote:

Hi,
I've got a couple of comments in what seems like quite a long draft.
Adrian
Abstract: This document describes protocol specific procedures and extensions for Generalized Multi-Protocol Label Switching (GMPLS) Resource ReserVation Protocol - Traffic Engineering (RSVP-TE) signaling to support end-to-end Label Switched Path (LSP) recovery that denotes protection and restoration. Not sure what is meant by "denotes" in this context. Perhaps "is used to provide" ?


-> was meant to say "that means..." - so any better wording ?

=== Section 2 OLD In addition, the reader is assumed to be familiar with the terminology used in [RFC3945], [RFC3471], [RFC3473] and referenced as well as [TERM] and [FUNCT]. NEW In addition, the reader is assumed to be familiar with the terminology used in [RFC3945], [RFC3471], [RFC3473], as well as in [TERM] and [FUNCT].


-> ok

=== Section 2 Checklog List from revision v01.txt: Please either remove this text or mark it so that the RFC editor will remove it.


-> ok

===
Section 3
Although the second paragraph defines "end-to-end protection" I would like to see this pulled out into its own paragraph for emphasis, and also a little more clarity added to the definition. For example, it would appear that your four types of e2e protection are all have the protecting and protected LSPs disjoint in some way - this makes it appear that this is a property of e2e protection in general and you should state this if it is true.

-> so you would like to see as part of the introduction a statement about end-to-end protection concept - i don't think there is any specific issue in adding this

=== Section 3 para 5 In 1:N (N =< 1) protection with extra-traffic, Hopefully you mean N >= 1


-> yes

===
Section 3 para 5
I note that you do not distinguish between 1:N and 1:N-with-extra-traffic.
If there is a reason for this perhaps you could add a note to the text.


-> indeed

=== Section 3 OLD working one. Here, the recovery resources for the protecting LSP are pre-reserved and explicit action is required to activate (i.e. NEW working one. Here, the recovery resources for the protecting LSP are pre-reserved but explicit action is required to activate (i.e. ^^^


-> ok

=== Section 3 OLD requirements by allowing multiple protecting LSPs to share common link and node resources. The recovery resources are pre-reserved and NEW requirements by allowing multiple protecting LSPs to share common link and node resources. The recovery resources are pre-reserved but ^^^


-> ok

=== Section 3 Note that in both cases, any lower priority LSP that would use the pre-reserved resources for the protecting LSP(s) MUST be preempted during the activation of the protecting LSP. This sentence comes out of the blue. The whole of the paragraph up to then has not even mentioned extra traffic.

-> don't understand your comment as it is mentioned at the beginning of the paragraph

I suggest you insert a paragraph break and a sentence explaining how the pre-reserved resources may be used to support an extra-traffic LSP...
Also, delete "would".


-> ok - this could indeed ease reading

=== Section 3 Full LSP re-routing (or restoration) switches normal traffic to an alternate LSP that is fully established only after working LSP failure occurs. This text does not read well in English. In fact, the same is true of pre-planned. What you mean is, I think, as follows... Full LSP re-routing (or restoration) switches normal traffic to an alternate LSP that is not even partially established until after the working LSP failure occurs.


-> indeed as none pre-provisioning actions are performed for the alternate

=== Section 3 Note that crankback signaling (see [CRANK]) and LSP segment recovery are further detailed in dedicated companion documents. Also, there Need to add a citation for LSP segment recovery.


-> ok

=== Section 4.2 OLD The recovery attributes includes all the parameters that determine NEW The recovery attributes include all the parameters that determine


-> ok

=== Section 4.2.1 - S (Secondary) bit: enables distinction between primary and secondary LSPs. A primary LSP is a fully established LSP for which the resource allocation has been committed at the data plane (i.e. full cross-connection has been performed). Both working and protecting LSPs can be primary LSPs. A secondary LSP is an LSP I am uneasy about this definition of "primary". In [TERM] the only mention of "primary" is in section 2... Recovery typically involves the activation of a recovery (or alternate) LSP when a failure is encountered in the working (or primary) LSP.

-> this definition needs to be updated as - by removing primary in the second parenthesis - it comes from a initial phase where no solution where yet started

This implies that "primary" is a synonym for "working".
Further, RFC3471 has a subtly different meaning for "secondary" in section 7.
   Protection Information also indicates if the LSP is a primary or
   secondary LSP.  A secondary LSP is a backup to a primary LSP.  The
   resources of a secondary LSP are not used until the primary LSP
   fails.  The resources allocated for a secondary LSP MAY be used by
   other LSPs until the primary LSP fails over to the secondary LSP.  At
   that point, any LSP that is using the resources for the secondary LSP
   MUST be preempted.
Can we please not modify the interpretation of the S-bit.


-> this is what the draft does as also explained in

"   In this document, the PROTECTION object uses as a basis the
   PROTECTION object defined in [RFC3471] and [RFC3473] and defines
   additional fields within it. The fields defined in [RFC3471] and
   [RFC3473] are unchanged by this memo. "

If you need to flag a new piece of information (to distinguish between resource allocated and not) then please introduce a new flag.
Note that the P-bit appears to be slightly orthogonal because the text seems to describe the *current* role of the LSP. (The S-bit in RFC3471 describes the role at the time the LSP is set up, I think).

-> note this has already been explained to you - where we came to the conclusion for clarity to add the above mentioned paragraph

=== Section 4.3 OLD When used for the working LSP signaling, the Association ID of the NEW When used for signaling the working LSP, the Association ID of the


-> ok

=== Section 4.3 OLD When used for the protecting LSP signaling, this field identifies NEW When used for signaling the protecting LSP, this field identifies


-> ok

=== Section 5 When a failure occurs (say at node B) and is detected at end-node D, the receiver at D selects the normal traffic from the other LSP. From this perspective, 1+1 unidirectional protection can be seen as an uncoordinated protection switching mechanism acting independently at both end-points. Also, for the protected LSP under failure condition, the Path_State_Removed Flag of the ERROR_SPEC object (see [RFC3473]) SHOULD NOT be set upon PathErr message generation. So, what you are saying is that in 1+1 protection the network may *never* know that the error is so bad that the LSP is dead, but MUST leave that choice to the ingress. While this is the operational practice in many transport networks, I don't see why you make this as strong as a SHOULD NOT.

-> what i mean is that this LSP in a 1+1 mode should not be released after failure (which is the common usage of such a scheme)

=== Section 5 Note: one should assume that both paths are SRLG disjoint otherwise, a failure would impact both working and protecting LSPs. What is this supposed to tell the reader? That he should make the assumption or that he should ensure SRLG diversity? ;-) Actually, I think you want to say that the quality of 1+1 protection may vary. Allowing link diverse, node diverse or SRLG diverse 1+1 protection. (ditto section 6 and 7)

-> indeed i can phrase this in a more prescriptive way (rather than the descriptive currently proposed)

=== Section 5.1 Since both LSPs belong to the same session, the SESSION object MUST be the same for both LSPs. An undisputable conclusion drawn from an unproven premise. Why must both LSPs belong to the same session? A one line explanation would start the section off nicely.


-> ok

=== Section 5.1 A new PROTECTION object is included in the Path message. This object What is the implication of "new"? I guess you mean the new type defined in this draft.


-> yes

=== Section 5.1 A new PROTECTION object is included in the Path message. This object carries the desired end-to-end LSP Protection Type (in this case, "1+1 Unidirectional"). This LSP Protection Type value is applicable to both uni- and bi-directional LSPs. This is unclear. In section 14.1 you have 0x08 1+1 Unidirectional Protection 0x10 1+1 Bi-directional Protection


-> in 5.1 "This LSP Protection Type value is applicable
   to both uni- and bi-directional LSPs. "

-> in 6.1 "This LSP Protection Type
   value is only applicable to bi-directional LSPs.

===
Section 5.1
Your description of the use of the P-bit for 1+1 protection isn't clear. You mean to say that the P-bit indicates which LSP the ingress would *prefer* to be the protecting LSP if all other things are equal, but your text (and the description of the P-bit in sections 4.2.1 and 14) don't make this clear.

-> ok instead of stating it is desirable i will provide a more deterministic statement

=== Section 6.2 directions. This is done using the Notify message with a new Error Code indicating "Working LSP Failure (Switchover Request)". The I don't see this in the IANA section, and I wonder if you also mean Error Value?

-> it corresponds to the entry "Notify Error/LSP Failure" (suggested value = 6)" error value is thus appropriate

=== Section 6.2 directions. This is done using the Notify message with a new Error Code indicating "Working LSP Failure (Switchover Request)". The Notify Ack message MUST be sent to confirm the reception of the Notify message (see [RFC3473], Section 4.3). I see no definition of a "Notify Ack message" in RFC3473 (in any section). I am worried that you are confusing the Ack message with a new procedure requiring a handshake of Notify messages.


-> ok i will rephrase that one

=== Section 6.2 1. If an end-node (A or D) detects the failure of the working LSP (or a degradation of signal quality over the working LSP) or receives a Notify message including its SESSION object within the <upstream/downstream session list> (see [RFC3473]), it MUST begin receiving on the protecting LSP Note that the sender descriptor or flow descriptor is also present in the Notify and this will considerably help resolve ambiguities and race conditions since it identifies the LSP.


-> ok i will add a statement along these lines

=== Section 6.2 1. If an end-node (A or D) detects the failure of the working LSP (or a degradation of signal quality over the working LSP) or receives a Notify message including its SESSION object within the <upstream/downstream session list> (see [RFC3473]), it MUST begin receiving on the protecting LSP I don't think the receipt of a Notify message is sufficient, per se. I think the error code and value need to indicate a problem with the LSP.

-> this is explained in the next paragraph, but i will add a statement with respect to this error code/value processing

"           Note: in this case, the IF_ID ERROR_SPEC replaces the
           ERROR_SPEC in the Notify message, otherwise the
           corresponding (data plane) information SHOULD be received
           in the PathErr/ResvErr message. "

=== Section 6.2 1. If an end-node (A or D) detects the failure of the working LSP (or a degradation of signal quality over the working LSP) or receives a Notify message including its SESSION object within the <upstream/downstream session list> (see [RFC3473]), it MUST begin receiving on the protecting LSP and send a Notify message reliably to the other end-node (D or A, respectively). "...send a Notify message reliably" will certainly be misunderstood. You presumably mean "...send a Notify message including the Message_ID object".


-> yes

=== Section 6.2 2. Upon receipt of the switchover message, the end-node (D or A, respectively) MUST begin receiving from the protection LSP and send a (Notify) Ack message to the other end-node (A or D, respectively) using reliable message delivery (see [RFC2961]). While this clarifies the use of Ack rather than Notify Ack (not sure why you need to include "(Notify)") it is now confused about the delivery of the Ack message. How do we achieve reliable delivery of an Ack message?!

-> needs to be rephrased as the only to achieve reliable bi-directional exchange is by ensuring three way handshake (Notify ->, Notify + Ack <-, Ack ->)

=== Section 7 Although the resources for the protecting LSP are pre-allocated, preemptable traffic may be carried end-to-end using this LSP (i.e. the protecting LSP is capable of carrying extra-traffic) with the caveat that this traffic will be preempted if the working LSP fails. Do you mean that the extra traffic is carried "using this LSP" or "using some or all of the resources assigned to this LSP"?


-> the first "using this LSP"

=== Section 7 Also, if extra-traffic is carried over the protecting LSP, the corresponding end-nodes may be notified of the failure in order to complete the switchover. I think this is "end-nodes may need to be notified"


-> ok

=== Section 7.2 To co-ordinate the switchover between end-points, an end-to-end switchover request is needed such that the affected LSP(s) are moved to the protecting LSP. In what way may there be more than one affected LSP moved to a single protecting LSP?

-> if someone uses the multiplier field of the SONET/SDH TSPEC, but there is a typo in the last LSP (missed a "s")

=== Section 7.2 This operation may be done using a Notify message exchange with a new Error Code indicating "(Working) LSP Failure (Switchover Request)". The Notify Ack message MUST be sent to confirm the reception of the Notify message. All of the same comments as for section 6.2. Also: - Why do you say "may be done"?


-> "initiated" or "performed"

- Is this the same error code as in 6.2? (the text is slightly different)

-> i will adapt error value per IANA section - the same for the error codes as part of section 6.2

=== Section 7.3 OLD provisioned protecting LSP is resource-disjoint LSP from the N NEW provisioned protecting LSP is resource-disjoint from the N


-> ok

===
Section 7.3
Can you highlight that the N working LSPs are all between the same pair of end points.


-> ok

=== Section 8 OLD this does not mean that the corresponding resources can not used by NEW this does not mean that the corresponding resources can not be used by


-> ok

=== Section 8 To make bandwidth pre-reserved for a protecting (but not activated) LSP, available for extra traffic this bandwidth could be included in the advertised Unreserved Bandwidth at priority lower (means numerically higher) than the Setup Priority of the protecting LSP. This feels like it should be the Holding Priority. That is, the Setup Priority was only important for how it could displace pre-existing LSPs.

-> ok (it will clarify) - but there was an implicit statement here that both are actually equal

===
Section 8.3
OLD
   From [GMPLS-ARCH], the secondary LSP is setup with resource pre-
NEW
   From [RFC3945], the secondary LSP is setup with resource pre-


-> ok

=== Section 9 OLD plane) a specific protecting LSP instantiated during the (pre- )provisioning phase. This requires restoration signaling along the NEW plane) a specific protecting LSP instantiated during the (pre-) provisioning phase. This requires restoration signaling along the


-> ok

=== Section 9 resource sharing), the LSPs must have the same Session Ids, but the Session Id includes the target (egress) IP address. These addresses 2xs/Id/ID/ Suggest a search for "id"


-> will use "ID"

===
Section 9.3
OLD
   From [GMPLS-ARCH], the secondary LSP is setup with resource pre-
NEW
   From [RFC3945], the secondary LSP is setup with resource pre-


-> ok

=== Section 10 OLD activated. Additional condition raises from mis-connection avoidance NEW activated. An additional condition arises from mis-connection avoidance


-> ok

=== Section 10 OLD Note that step 1 may cause alarms to be raised for the pre-empted LSP. If alarm suppression is desired the pre-empting node MAY expand before applying step 1 act as follows. NEW Note that step 1 may cause alarms to be raised for the pre-empted LSP. If alarm suppression is desired the pre-empting node MAY insert the following steps before step 1.


-> ok

=== Section 10 At the downstream node (with respect to the pre-empting LSP) the processing is RECOMMENDED to be as follows: 1. Receive PathTear (and/or PathErr) message for the pre-empted LSP(s). 2a.Release the resources associated with the LSP on the interface to the pre-empting LSP, remove any cross-connection and release all other resources associated with the pre-empted LSP. 2b.Forward the PathTear (and/or PathErr) message per [RFC 3473]. C. Receive the Path message for the pre-empting LSP and process as normal, forwarding it to the downstream node. D. Receive the Resv for the pre-empting LSP and process as normal, forwarding it to the upstream node. Cool numbering scheme :-) Any chance of settling on something more conventional?


-> ok - will use 1, 2a, 2b, 3, 4

=== Section 11.2 Note: when the end-to-end LSP Protection Type is set to "Unprotected", both S and P bit MUST be set to 0 and the LSP SHOULD NOT be re-routed at the head-end node after failure occurrence. The Association_ID value MUST be set to the LSP_ID value of the signaled LSP. Please explain the difference between an attempt to "re-route" and an attempt to "re-establish". presumably it could involve: - a time difference - the use of make-before-break for failed LSPs. - the use of the ASSOCIATION object. I would like to make sure that you are not applying "SHOULD NOT" to LSP re-establishment.

-> it means do not apply re-routing as specified in this document, i will add a paragraph around these lines

=== Section 12 OLD allocated to the LSP that was originally routed over it even after a NEW allocated to the LSP that was originally routed over them even after a


-> ok

=== Section 12 - then, apply the reverse 1-phase APS switchover request/response (or 2-phase APS) described in Section 6.2 (or Section 7.2, This is the first mention of APS

-> see section 7.2 in this section 12 i will use the term "protection switching signaling" instead

===
Section 13
I think this section is going to give us grief during IESG review :-(
Why do we need to tie this so closely with NMS etc. And why describe it as external?
Can't we simply describe the function by:
- dropping the first para
- in C, D and E drop "externally"
- in D and E replace "manual" with "requested"


-> ok

=== Section 13 TWICE OLD Recovery signaling operation is initiated externally that switches NEW Recovery signaling is initiated externally that switches

-> ok

=== Section 13 (A and B) is set to either 0x04, or 0x08 or 0x10. I would prefer you to use the meanings rather than the values.


-> ok will add the description

=== Section 13 (D and E) This, unless a fault condition exists on ? "This is allowed"? "This is possible"? "This is successful"?


-> would you clarify what you mean here

=== Section 14 OLD use so that the object can be included in the Notify message to act a switchover request for 1+1 bi-directional and 1:1 protection. NEW use so that the object can be included in the Notify message to act as a switchover request for 1+1 bi-directional and 1:1 protection.


-> ok

===
Section 14.1
I believe we have had this discussion before.
We don't introduce reserved fields for future extensibility. We only do it for padding.
If you are certain that we need to extend in the future then please use sub-objects or TLVs.
This means that you can:
a. Remove the last four bytes of the Protection object.
b. Retain the C-Type from RFC3473

-> will add a reference to segment recovery (that does not make use of TLVs) - see section 6.1 of SEG-REC

=== Section 15 This object MUST be present in the Path message (for the pre-provisioning of the secondary protecting LSP) if and only if the LSP Protection Type value is set to "0x02". "MUST if and only if" is not really in RFC2119. Can we two statements. One with "MUST" and one with "MUST NOT".


-> ok (but i don't understand why this logical statement is not allowed)

=== Section 15 In the case where my protecting LSP protects only one working LSP and where the full path of the protecting LSP is known by the ingress (strict and explicit) and there is no resource sharing between the protected and protecting LSP, I can't see why I must include a PPRO. In other words, PPRO is an enabler of function (as stated in section 15.4 "The PPRO enables of sharing recovery resources between a given secondary protecting LSP and one or more secondary protecting LSPs if their corresponding primary working LSPs have mutually (link/node/SRLG)disjoint paths."), but that does not make its presence mandatory. === Section 15.1 The contents of a PRIMARY_PATH_ROUTE object are a series of variable-length data items called subobjects. The subobjects are identical to those that can constitute an EXPLICIT/RECORD ROUTE object as defined in [RFC3209], [RFC3473] and [RFC3477]. This seems in contradiction with section 15.3


-> identical in terms of content definition

=== Section 15.4 OLD The PPRO enables of sharing recovery resources between a given NEW The PPRO enables sharing of recovery resources between a given


-> ok

=== Section 16 The ASSOCIATION object is used to associate LSPs with each other. In the context of end-to-end LSP recovery, the association MUST only identify LSPs that support the same Tunnel ID. Hmmm. presumably same source and destination is relatively important too.

-> ok i will add a statement here even if it is already part of the next sentence

=== Section 16 The ASSOCIATION object is used to associate LSPs with each other. You already said this.


-> i will remove it

=== Section 16.1 Association ID: 16 bits A value that when combined with Association Type and Association Source uniquely identifies an association. It would be helpful to state who assigns this value.


-> it is the sender (i will add this)

=== Section 16.1 Association Source: 4 or 16 bytes The IP address of the node that originated the association. "The IP address"? Question. Are two associations with the same Association ID equivalent if the Association Source addresses are different but identify the same node? Answer (it transpires) is "no". You need to make this much clearer here.

-> ok i will add a statement along these lines, note this statement will be added such that relationship between values is clearer

===
Section 17
Isn't Notify modified as well?
And I thought Resv was, but I may have been sleeping.


-> Resv needs to be added (Notify is not modified)

===
Section 18
This is a bit poor.
If you don't modify the "external commands" section, you'll certainly have to discuss security for them. After all, a forced failover can be pretty disruptive.
But I think you need to discuss misconnection here. In particular when there is mesh protection going on.

-> i will include specifics inlight with the text provided in the functional specification

===
Section 19
The IANA section needs some gardening to make it really easy for IANA to implement.
- Break it up into clearer subsections.
- Make sure you have included all of the information needed in the registry
- Point back at the defining sections of the draft
- Only have suggested values in one place in the document
- Be consistend in using TBD or TBA in the document

-> ok (do think put the proposed value as part of the text and the IANA section ease reading)

===
Section 19
Should the IANA section also cover the bits in the ADMIN STATUS object?

-> we did not register these bits since if you think we need to register them as part of this section just let me know

===
Section 21
Missing references [CRANK], [RFC2205]. Suspect you need to check them all.
Will need to add a reference for LSP segment recovery.


-> ok

===
Section 21.1
This seems a very long list of normative references. I hope you can split this so that most of the references are informational.

-> ok

===
Section 22
You might change this to "Editors' Addresses"


-> ok

Follow-Ups:
- Re: Last call review of draft-ietf-ccamp-gmpls-recovery-e2e-signaling-02.txt
  - From: "Adrian Farrel" <adrian@olddog.co.uk>

References:
- Last call review of draft-ietf-ccamp-gmpls-recovery-e2e-signaling-02.txt
  - From: "Adrian Farrel" <adrian@olddog.co.uk>

Prev by Date: Re: comments on draft-ietf-ccamp-gmpls-te-mib-08.txt
Next by Date: Re: Last call review of draft-ietf-ccamp-gmpls-recovery-e2e-signaling-02.txt
Previous by thread: Last call review of draft-ietf-ccamp-gmpls-recovery-e2e-signaling-02.txt
Next by thread: Re: Last call review of draft-ietf-ccamp-gmpls-recovery-e2e-signaling-02.txt
Index(es):
- Date
- Thread