Re: Working Group Last Call draft-ietf-ccamp-loose-path-reopt-

Here are my comments on this draft.

It is a great shame to see an I-D put up for last call when it fails the idnits script. In particular, since this concerns how the IPR statements are interpreted, it is crucial that these issues are resolved before the draft can move forward. Additionally it is a matter of courtesy to the readers to resolve the formatting issues.

It seems to me that this draft is applicable to a strict ERO where one of the hops is a non-specific abstract node such as an AS number. This is made clear in section 2, but the Abstract and Introduction (yeah, and also the title and draft name) do not adequately expose this fact. But, further, the Introduction talks only about reoptimization without any mention of loose hops or abstract nodes. Thus the draft is schizoid to the third degree - is this loose path reoptimization, reoptimization of loose and non-specific abstract nodes, or general reoptimization? The draft needs to be consistent and clear.

The title contains acronyms which need to be spelled out (MPLS and LSP).

The Abstract is too long. Need it about half the length. You can move some of the material into the introduction which is currently rather short (shorter than the abstract!) Same comment about acronyms (MPLS, GMPLS, TE, LSP, ERO) - make sure they are expanded for their first usage.

Conventions used in this document. Contains unresolved citation.

Section 2

s/Ipv4/IPv4/

Section 2 states that an ERO expansion is either up to the next loose hop or to the destination. But, in fact, the ERO expansion may also be any partial fragment towards either of these targets (including next hop resolution). I suggest re-wording this paragraph to list (as bullets) what an ERO might contain, and in a separate list, what the computation might produce.

Section 2

s/The path an/The path of an/

Section 2.

s/head-End/head-end/

In section 2 I don't like the distinction between "CSPF or any other PCE-based path computation method". Part of the implication here is that PCE might not perform CSPF! I suspect you are trying to highlight that the computation may be performed locally of remotely. I don't think this is relevant to this I-D; you should simply say "invokes path computation".

Section 2

[RSVP-TE] unresolved normative reference.

Section 4.1

s/but instead just signal/but instead just signals/

In section 4.1 you add a note about the selection of component links from within a bundle. While this is true, it is unclear why you pick this case out but don't describe the selection of alternate resources (e.g. lambdas). This is associated to the new error values defined in section 4.2. How would you report a component link going oos? How would you report a link resource (e.g. a lambda) going oos? If you use "local link maintenance required" won't the computing node believe that the whole link is unusable? If your answer here is that the recomputation will ignore the error value and will perform a recomputation based on the new TED (see [GR-SHUT]) then why do you need to distinguish between link maintenance required and node maintenance required? If you actually need to report the component link or resource as a separate quantity, I suggest you refer to the crankback draft.

Section 4.1

I'm not comfortable with the Session Attributes toggling like this. This type of function is what the Admin Status object was invented for.

Section 5.3.1

This
bit is then cleared in subsequent RSVP path messages sent downstream.

This implies that a Path refresh *never* carries this bit set (which makes it a trigger when it comes after a Path with the bit set).

Thus we may lose the request (either through a lost Path message, or through a refresh catching up with a trigger Path message). I think we discussed this before. You need to make it clear in the draft that these requests can be lost.

I think it is also worth considering how to prevent the toggling off of the bit from appearing as a trigger message.

Section 5.3.1

        At this point, the LSR
        MAY decide to clear the ?h re-evaluation request?t of the
        SESSION-ATTRIBUTE object in subsequent RSVP Path messages sent
        downstream: this mode is the RECOMMENDED mode for the reasons
        described below.
It really isn't a matter of clearing the bit, so much as not propagating it. That is, it is not necessary to send a new Path message at this point.

Section 5.3.1

[PATH-COMP] is required as a normative reference.

In section 5.3.2

        - The link (sub-code=7) or the node (sub-code=8) MUST be
        locally registered for further reference (the TE database must
        be updated)
What does "the TE database must be updated" mean? Are you saying that the TED is now built from information flooded by the IGP *and* by information fed back from signaling? If so (and I don't approve!) then you must define what happens when you receive a new LSA for the specific link that contradicts the information signaled. There is a strong argument that says that *the* method we use for building the TED is IGP flooding - if this mechanism doesn't provide you with the information you need, then you should propose extensions to the IGP, not hook the information onto signaling.

OTOH it may be that all you mean is that the Session state should be updated to indicate the link or node that is being shut down so that later recomputation can avoid this link. In this case, I suggest you refer to the CCAMP crankback draft.

In section 5.3.2

        - ... Note that in the case of TE LSP
        spanning multiple administrative domains, it may be desirable
        for the boundary LSR to modify the RSVP PathError message and
        insert its own address for confidentiality reason.
Yes. Good point, but doesn't the error code also need to change? Otherwise it will appear that the border node is the node being taken oos.

Section 5.3.3. suggests the use of a timer. You must, therefore, suggest a default time value. I suspect that you want to suggest some basic multiple of the path computation time or of the IGP refresh period.

Section 6

Need to describe the processing by an LSR that does not understand the new flag (rather than understand it but not support it). note that you cannot define the behavior of legacy LSRs in this draft, so you must reference behavior defined in some other document.

Ditto the new error code.

Section 7

This technique has implications for the trust model between domains. In particular, one domain may cause another to perform additional (excess or unnecessary) work simply to ease its own task or for malicious reasons. Similarly, a headend domain might choose to ignore the requests for re-optimization issued by another domain. I think you need to point out that the peering agreements between domains need to include a definition of how this technique is supported.

Section 10

"Normative references" and "Informative references" need section numbers

Full Copyright Statement

Unnecessary quote marks.

Question...

How does the process of unsolicited notification (of a potential better path rather than of a link going oos) avoid thrashing races? As a very simple example, consider the following n/w.

<-A1-> <--A0-> <-A2->

A-----B C-----D

| |

E-----F---G---H-----I

Set up two LSPs AI and ED using EROs {A,B(L),H(L),I} and {E,F(L),C(L),D} producing paths ABFGHI and EFGHCD.

Now install a *low* bandwidth link BC capable of carrying either but not both LSPs. Both B and F will notice that the LSPs entering A0 through them can be re-optimized and will report the fact to A and E respectively. Both A and E will attempt mb4b, but (of course) only one will succeed. In a small network, this is not a big deal, but in a large network with a lot of LSPs this is clearly a waste of processing and will cause a degree of network thrash maybe only in the control plane, but maybe in the data plane if a lower priority LSP is re-routed first. In fact, this scenario can cause significant disruption in the data plane as the re-routed LSP will be preempted and could have been successfully left in its original place.

It seems that a considerably sophisticated policy is required for any domain, but particularly core domains like A0. In effect, the domain needs to evaluate the new link by examining all LSPs in the system and selecting which one(s) should be re-optimized. This type of processing is non-trivial and uses information stores that are not generally available (i.e. LSP maps).

Thus I would suggest removing the unsolicited notification of reoptimization opportunities (while retaining the unsolicited notification of links going oos) or requiring that the policy be timer-based not event triggered.

Adrian