[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Working group last call on draft-ietf-ccamp-rsvp-restart-ext-02.txt



Adrian,

Thanks for the comments. Will update the I-D and respin.

Thanks,
Arun
===============================================================
Adrian Farrel wrote:
Hi,

I have some comments on this draft

Please fix the boilerplate. The Secretariat are now policing this tightly
and will fail any future submissions. You would do well to run idnits over
any future version before submitting it.

Please indicate at the head of the I-D what the intended category of the
draft is. Presumably Standards Track.

Do you believe that this I-D updates RFC3473? If so, please put this in
the header of the draft.
Ditto RFC2961.

There are some instances of "should" / "should not" / "must" /
'recommended" in lower case. Please be careful that you mean this.

Abstract uses two acronyms (ERO and LSP) without expanding them.

Introduction
I fell that it would be helpful to state why we are working on graceful
restart. Probably by expaning the first sentence to say very briefly what
graceful restart actually is. Something like "to synchronize the control
plane state of a restarting control plane component with the data plane
state that has been preserved, carrying user traffic during the control
plan component downtime."

Introduction (and throughout)
You talk about a restarting "node", but I think you really mean a
restarting control plane signaling component (often called a signaling
controller). It is helpful to distinguish the control plane from the data
plane for graceful restart, and I think that the term "node" does not
help.
I appreciate that RFC3473 uses the term "nodal fault".
Maybe you can make a terminology statement at the top of the I-D to the
effect that where you say "node" you are generally refering to the
signaling controller for the data plane switch.

Section 2.3 Transmission of RecoveryPath messages
There is a slight contradiction or confusion here.
First, the text says that the downstream neighbor "should spread the
messages across 1/2 the Recovery Time interval." But then it points out
that retransmission may be required with up to three retransmissions, and
says... "the period between re-send attempts SHOULD be such that at least
3 attempts are completed before the expiry of 1/2 the Recovery Time
interval."
This implies that (if the Message ID mechanim is not in use) the
downstream neighbor SHOULD spread the messages across 1/6 the Recovery
Time interval.
But see also section 3.3.1 where the Srefreshes are sent for the first
time over 1/4 of the interval and may also be retransmitted (presumably up
to 3 times).

Sections 2.4.1, 2.4.2, 3.3.2 (twice)
s/triggered Path/trigger Path/

Section 2.4.3
Twice in this section you give "SHOULD" actions regarding the discard of
partial state. What are the implications of not following this advice? Why
is it not "MUST"?

Section 3.2.2
"There are no compatibility issues introduced in this section."
You probably mean "in the procedures defined in section 3.2.1."

Section 3.3, 3.3.1 (including title), and 3.3.2 (including title)
Several instances of "RecoveryPath-related Srefresh message" can be
reduced to ""RecoveryPath Srefresh message"

Section 3.3.1
   "A neighbor of a restarting node generates one or more"
Is this "MAY", "SHOULD" or "MUST"?

Section 3.3.1
Please be very careful to say "RecoveryPath Srefresh" every time you mean
it, because normal Srefresh processing may continue in parallel. Otherwise
your text will be interpretted to mean that normal Srefresh is broken. In
fact, you should make clear that these procedures do no impact normal
Srefresh processing.

Section 3.3.2
See the timing comments for section 2.3.
Here you say...
      The restarted node MAY spread the transmission of
      such triggered Path messages across 1/2 of the remaining Recovery
      Period to allow the downstream RSVP neighbor sufficient processing
      time.
Do you mean, up until the end of the 3rd quarter, or do you mean for each
Path message, spread over half of the remaining time to the end of the
whole interval?

Section 5
If an outside influence is able to cause the control plane of a node to go
down and come up again, this will obviously impact the controllability of
the LSPs that transit that node. Your procedures restore the ability to
control those LSPs and so they have a positive influence on security, and
you should say so.
On the other hand, your procedures impact the neighboring upstream nodes
and cause them to perform increased processing possibly to the detrement
of their handling of other control plane messages. Thus, by toggling one
node I can impact LSPs that do not transit that node. You need to flag
this and point out that it is handled by setting the Recovery Time
sufficiently large.

Section 6
This section is very weak!
Could you please name the objects/messages and point to the sections where
they are defined.
The object is not "of form 10bbbbbb". That is the form of the C-Num.

Section 6
So you think there is a need for IANA to track the bit allocations in the
Flags field of the Message_ID List object?

Section 7.2
You might want to consider removing this reference since the I-D is, I
think, expired without any intention of being regenerated.

Section 11
The year is now 2005 :-)

Thanks,
Adrian