[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Working Group Last Call draft-ietf-ccamp-loose-path-reopt-



Hi Adrian,

Agreed comments have been removed - See in line,

On Jan 14, 2005, at 6:43 AM, Adrian Farrel wrote:

<x-tad-smaller>It seems to me that this draft is applicable to a strict ERO where one of the hops is a non-specific abstract node such as an AS number. This is made clear in section 2, but the Abstract and Introduction (yeah, and also the title and draft name) do not adequately expose this fact. But, further, the Introduction talks only about reoptimization without any mention of loose hops or abstract nodes. Thus the draft is schizoid to the third degree - is this loose path reoptimization, reoptimization of loose and non-specific abstract nodes, or general reoptimization? The draft needs to be consistent and clear.</x-tad-smaller>

Agree, the following definition has been adopted throughout the document: "A loosely routed LSP is defined as an LSP that follows a path that contains at least one loose hop or a strict (abstract node) hop"

I guess that the document title can remain unchanged considering that a loose path also includes the case of a path where at least one hop is an abstract node.

 
<x-tad-smaller>The title contains acronyms which need to be spelled out (MPLS and LSP).</x-tad-smaller>
 

Added

<x-tad-smaller>The Abstract is too long. Need it about half the length. You can move some of the material into the introduction which is currently rather short (shorter than the abstract!) Same comment about acronyms (MPLS, GMPLS, TE, LSP, ERO) - make sure they are expanded for their first usage.</x-tad-smaller>
 

ok, sections expanded, others shortened ... overall same length ;-)

<x-tad-smaller>Section 2 states that an ERO expansion is either up to the next loose hop or to the destination. But, in fact, the ERO expansion may also be any partial fragment towards either of these targets (including next hop resolution). I suggest re-wording this paragraph to list (as bullets) what an ERO might contain, and in a separate list, what the computation might produce.</x-tad-smaller>

We listed in this paragraph the most usual case of ERO expansion. If you're ok with this, elaborating further on ERO expansion is out of the scope of this document.

<x-tad-smaller>In section 4.1 you add a note about the selection of component links from within a bundle. While this is true, it is unclear why you pick this case out but don't describe the selection of alternate resources (e.g. lambdas). This is associated to the new error values defined in section 4.2. How would you report a component link going oos? How would you report a link resource (e.g. a lambda) going oos? If you use "local link maintenance required" won't the computing node believe that the whole link is unusable?
</x-tad-smaller>

Indeed, this is why the node in charge of this link going oos should make the appropriate local decision on whether to report it, should an equivalent link not be found.

<x-tad-smaller>If your answer here is that the recomputation will ignore the error value and will perform a recomputation based on the new TED (see [GR-SHUT]) then why do you need to distinguish between link maintenance required and node maintenance required? If you actually need to report the component link or resource as a separate quantity, I suggest you refer to the crankback draft.</x-tad-smaller>
 
<x-tad-smaller>Section 4.1</x-tad-smaller>
<x-tad-smaller>I'm not comfortable with the Session Attributes toggling like this. This type of function is what the Admin Status object was invented for.</x-tad-smaller>
 
<x-tad-smaller>Section 5.3.1</x-tad-smaller>
<x-tad-smaller>   This </x-tad-smaller>
<x-tad-smaller>   bit is then cleared in subsequent RSVP path messages sent downstream.</x-tad-smaller>
<x-tad-smaller>This implies that a Path refresh *never* carries this bit set (which makes it a trigger when it comes after a Path with the bit set).</x-tad-smaller>
<x-tad-smaller>Thus we may lose the request (either through a lost Path message, or through a refresh catching up with a trigger Path message). I think we discussed this before. You need to make it clear in the draft that these requests can be lost.</x-tad-smaller>
<x-tad-smaller>I think it is also worth considering how to prevent the toggling off of the bit from appearing as a trigger message.</x-tad-smaller>

Case of a lost path message: in order to cover this case, *the* solution consists of using reliable messaging as defined in RFC2961 (note that any change in a Path message objects would indeed not be detected if the Path is lost, this applies to any other LSP attribute). Note also, that resending the same Path message N times (which is an option that we did investigate) would not really solved the issue since the Path message would be resent after a delay which is dependent of the refresh period (which can itself be very long). Then this causes race condition with the reoptimization timer running on the head-end LSR which may lead to undesirable conditions. Thus, a Path message may indeed be lost and the solution consists of using reliable messaging, should the operator consider the lost of such Path message be unacceptable. I will add some text there. thanks.

 
<x-tad-smaller>In section 5.3.2</x-tad-smaller>
<x-tad-smaller>        - The link (sub-code=7) or the node (sub-code=8) MUST be </x-tad-smaller>
<x-tad-smaller>         locally registered for further reference (the TE database must </x-tad-smaller>
<x-tad-smaller>        be updated)</x-tad-smaller>
<x-tad-smaller>What does "the TE database must be updated" mean? Are you saying that the TED is now built from information flooded by the IGP *and* by information fed back from signaling? If so (and I don't approve!) then you must define what happens when you receive a new LSA for the specific link that contradicts the information signaled. There is a strong argument that says that *the* method we use for building the TED is IGP flooding - if this mechanism doesn't provide you with the information you need, then you should propose extensions to the IGP, not hook the information onto signaling.</x-tad-smaller>

Let me sightly disagree here. I'm fine to not mention this since this may be implementation specific. That said, I do think that this is highly desirable (in combination with timer-based mechanism) so as to speed convergence. Typically, upon receiving a PathErr message it does make sense to first update your TED or the head-end will keep trying the same path until an LSA/LSP get received. In many networks, such optimization is definitely required to speed up the TE LSP rerouting. Note that such behavior is implemented in commercial product.

<x-tad-smaller>OTOH it may be that all you mean is that the Session state should be updated to indicate the link or node that is being shut down so that later recomputation can avoid this link. In this case, I suggest you refer to the CCAMP crankback draft.</x-tad-smaller>

Still such update may be beneficial to other TE LSP and is orthogonal to the use of crankback ?

 
<x-tad-smaller>In section 5.3.2</x-tad-smaller>
<x-tad-smaller>        - ... Note that in the case of TE LSP </x-tad-smaller>
<x-tad-smaller>        spanning multiple administrative domains, it may be desirable</x-tad-smaller>
<x-tad-smaller>         for the boundary LSR to modify the RSVP PathError message and </x-tad-smaller>
<x-tad-smaller>        insert its own address for confidentiality reason. </x-tad-smaller>
<x-tad-smaller>Yes. Good point, but doesn't the error code also need to change? Otherwise it will appear that the border node is the node being taken oos.</x-tad-smaller>

If you agree with this argument I would vote for keeping the same error code since this would not change the action taken by the head-end.

 
<x-tad-smaller>Section 5.3.3. suggests the use of a timer. You must, therefore, suggest a default time value. I suspect that you want to suggest some basic multiple of the path computation time or of the IGP refresh period.</x-tad-smaller>

Right, good point. Default of 5s has been suggested (not really correlated to the path computation time or the IGP refresh period since it relates more to the number of LSPs and level of network "dynamicity" which includes a number of parameters).

 
<x-tad-smaller>Section 6</x-tad-smaller>
<x-tad-smaller>Need to describe the processing by an LSR that does not understand the new flag (rather than understand it but not support it). note that you cannot define the behavior of legacy LSRs in this draft, so you must reference behavior defined in some other document.</x-tad-smaller>
<x-tad-smaller>Ditto the new error code.</x-tad-smaller>

Unfortunately I do not think that RFC3209 specifies the behavior of a node receiving a SESSION ATTRIBUTE flag that it does not understand ... An implementation should then just ignore such flag if it does not understand it.

 
<x-tad-smaller>Section 7</x-tad-smaller>
<x-tad-smaller>This technique has implications for the trust model between domains. In particular, one domain may cause another to perform additional (excess or unnecessary) work simply to ease its own task or for malicious reasons. Similarly, a headend domain might choose to ignore the requests for re-optimization issued by another domain. I think you need to point out that the peering agreements between domains need to include a definition of how this technique is supported.</x-tad-smaller>

Agree Adrian and this is in line with the Inter-area req doc. Will add some text:

"Furthermore, a head-end LSR may decide to ignore explicit notification coming from a mid-point residing in another domain. Similarly, an LSR may decide to ignore (or accept but up to a pre-defined rate) path re-evaluation requests originated by a head-end LSR of another domain. "

thanks.

<x-tad-smaller>Question...</x-tad-smaller>
<x-tad-smaller> </x-tad-smaller>
<x-tad-smaller>How does the process of unsolicited notification (of a potential better path rather than of a link going oos) avoid thrashing races? As a very simple example, consider the following n/w.</x-tad-smaller>
 
<x-tad-smaller><-A1-> <--A0-> <-A2-></x-tad-smaller>
<x-tad-smaller>A-----B       C-----D</x-tad-smaller>
<x-tad-smaller>      |       |</x-tad-smaller>
<x-tad-smaller>      |       |</x-tad-smaller>
<x-tad-smaller>E-----F---G---H-----I</x-tad-smaller>
 
<x-tad-smaller>Set up two LSPs AI and ED using EROs {A,B(L),H(L),I} and {E,F(L),C(L),D} producing paths ABFGHI and EFGHCD.</x-tad-smaller>
 
<x-tad-smaller>Now install a *low* bandwidth link BC capable of carrying either but not both LSPs. Both B and F will notice that the LSPs entering A0 through them can be re-optimized and will report the fact to A and E respectively.
</x-tad-smaller>

note that if the link is a "low" bw link, it is unlikely that B and F will report a better path but yes that could happen depending on the IGP links costs indeed.

<x-tad-smaller>Both A and E will attempt mb4b, but (of course) only one will succeed. In a small network, this is not a big deal, but in a large network with a lot of LSPs this is clearly a waste of processing and will cause a degree of network thrash maybe only in the control plane, but maybe in the data plane if a lower priority LSP is re-routed first. In fact, this scenario can cause significant disruption in the data plane as the re-routed LSP will be preempted and could have been successfully left in its original place.</x-tad-smaller>

Indeed, but this is no different that any other reoptimization scenario in a single area. If for example, a link is restored within an area that could offer a potentially more optimal path to a large number of TE LSPs, there could be race conditions indeed. This is usually sorted out by using jittered reoptimization at the head-end.

 
<x-tad-smaller>It seems that a considerably sophisticated policy is required for any domain, but particularly core domains like A0. In effect, the domain needs to evaluate the new link by examining all LSPs in the system and selecting which one(s) should be re-optimized. This type of processing is non-trivial and uses information stores that are not generally available (i.e. LSP maps).</x-tad-smaller>
<x-tad-smaller> </x-tad-smaller>
 
<x-tad-smaller>Thus I would suggest removing the unsolicited notification of reoptimization opportunities (while retaining the unsolicited notification of links going oos) or requiring that the policy be timer-based not event triggered.</x-tad-smaller>

We would definitely prefer to keep this mode. Implementation could just activate the function for *some* sensitive LSP + combined with with jittered reoptimization but such notification is desirable to quickly take advantage of a newly restored link.

NOTE that link dampening (IGP) can also be used here; this had been discussed in the FRR document.

Thanks for your comments !

JP.

 
 
<x-tad-smaller>Adrian</x-tad-smaller>