[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

comments on draft-ietf-shim6-failure-detection-06.txt {4}



I reviewed the failure detection draft today and have the following
comments.

Clarity issues
==============
1. The state machine could be clarified a bit.  There is an
"Operational" state but associated with this state are two timers, and
depending on which timers are active at a given time, there are actually
three substates, which one might call "Idle" (no timers running),
"Operational-In" (KeepAlive running), and "Operational-Out" (Send
running).  The draft implies that Send and Keepalive are not running
simultaneously ("Upon the reception of a payload packet in the
Operational state, the node starts the Keepalive timer if it is not yet
running, and stops the Send timer if it was running."); this implies
that a bidirectional data flow will (conceptually) be toggling
continuously between Operational-In and Operational-Out.  It could be
better clarified that if a host moves to a quiescent state upon sending
a single keepalive message, unless woken up by either its own send event
or the reception of another peer data packet.     

It might be worth considering to break out the Operational state into
substates to see what the implications are for the state machine, and
only then decide whether one wants to collapse the substates into a
single state.  Perhaps this has already been considered and rejected
(I'm new to this discussion). 

2.  The linkage between these failure detection mechanisms and return
routability checks is fuzzy.  If REAP detects through probing that a
locator pair works, does this suffice for reachability, or does separate
shim6 process need to send Update Request?  The last paragraph of 3.3
seems to imply that shim6 must separately verify return routability.
But the shim6 draft says that the Probe message exchange handles this
routability check.    This really ought to be clarified somewhere in
this draft.

3. There seems to be a mismatch with the shim6-proto draft, where it
says that there are "Probe Option", "Reachability Option", and "Payload
Reception Report Option" TLVs defined in this draft.  I couldn't find
any such Option definitions.

4.   There could be more clarity in the terminology for address pairs.
The protocol draft talks of ULID-pairs and current locator pair, while
this draft talks about primary address pair and current address pair.
Second, it probably needs to be distinguished that there is an inbound
address pair and an outbound address pair that may be the same or
different.  I suggest to consistently use the terms "ULID-pair",
"outbound locator pair", "inbound locator pair" and possibly to clarify
that when "locator pair" is used without reference to inbound or
outbound, it refers to the case where they are the same pair.

5.  In general, sections 6 and 7 seem drafty to me.  Section 7
(examples) probably should be an appendix, and section 6 should be
expanded into more detailed elements of procedure, such as how to fill
in parameter values and what are the legal values, that do not require
reading section 7 to find the details.

6.  In section 7, examples 4 and 5, where are the keepalives preceding
the 10 second timeout.  Why won't peer A be the one to first have a Send
timeout in example 4?

Suggested technical change
==========================
I disagree with fixing the KeepaliveTimeout to 10 seconds and making the
possible negotiation an area for future study.  I don't see why each
node couldn't set a parameter in the initial exchange that directed a
peer to use a particular timeout value.   There doesn't have to be any
negotiation; each side can configure the KeepaliveTimeout that the peer
must use for setting Keepalive timer, and optionally include the
parameter in the I2, R2, and possibly I2-bis packets.  Send timer can be
aligned with the Keepalive timer interval given to the peer.  The
default (if no such parameter is present) could be 10 seconds.  I
suggest to define the parameter now, as well as instructing
implementations to honor it if they receive such a parameter, and say
that the heuristics for sending it are for further study.
 
Minor
====
title of section 4.2 "Alternate Address Pair Exploration" should
probably be "Full Reachability Exploration" to match other references in
the text to this procedure

in section 3.5, s/multible/multiple

in section 3.4, the acronym "ULID" is used for the first time without
any expansion or reference.

ExploringOK state could have a name that is more indicative of the state
of the exploration, such as InboundOK

In section 6, the explanatory text for each event precedes the numbered,
underlined event definition, and it is somewhat confusing to read
because the numbered lines seem to delineate the sections but they in
fact do not.

- Tom