[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

review of draft-ietf-shim6-failure-detection-03.txt



Hi all,

Review of
http://www.ietf.org/internet-drafts/draft-ietf-shim6-failure-detection-0
3.txt

Here is my 2nd attempt at reviewing this document.  First one
mysteriously vanished. This has started getting long, so I will send it
in multiple parts.

Summary: I think that this document needs more work.

Small general request.  When making references, could you at least say
what the reference is?
For example:

   ... Techniques outside the scope of this
   document are used for this, for further information see [18].

I hate having to scroll to the reference section to see what you are
referencing.  At least
tell the protocl that you are refering, so I can see immediately if this
is something I am
already familiar with.

1) Abstract says:

   This document defines a mechanism for the detection of communication
   failures between two communicating hosts at IP layer, and an
   exploration protocol for switching to another pair of interfaces
   and/or addresses between the same hosts if a working pair can be
   found.  

Is this meant as a full-fledged, stand-alone protocol, or is it a
protocol in a Neighbor Discovery
sense?  This confused me; I think I'd prefer if it was the latter, and
was bound to the SHIM6
base protocol.  I'd suggest fixing the language so that people don't
assume it is a stand-alone
protocol.

   The draft also discusses the roles of a multihoming protocol
   versus network attachment functions at IP and link layers.

2) Intro says:

   This draft defines the mechanism and protocol to achieve both failure
   detection and locator pair exploration.  This protocol is called
   REAchability Protocol (REAP).  It designed to be carried within the
   SHIM6 protocol, but may also be used in other contexts.

I think we should drop the '... may be used in other contexts.' as this
seems
outside of the scope of SHIM6. I don't think that SHIM6 should work on a
general
purpose failure detection & path exploration protocol.  

   ...  We assume that there are other, higher
   level identifiers such as CGA public keys or HBA bindings that tie
   the different locators used by a node together [17].

Do we need to care about higher level identifiers for REAP?  My guess is
that
is something that should really matter for REAP, so that suggesting CGA
or HBA
would be something of relelvance for REAP is misleading.

3) Section 3:

   In SCTP [10], the addresses of the endpoints are learned in the
   connection setup phase either through listing them explicitly or via
   giving a DNS name that points to them.  

I'd just suggest that you say 

   In SCTP [10] transport addresses (IP address and port pairs) are
exchanged
   during SCTP Association (i.e. - connection) setup phase.

I don't think any implementations are using DNS at the moment for this.

This paragraph:

   SCTP does not define how local knowledge (such as information learned
   from the link layer) should be used.  SCTP also has no mechanism to
   deal with dynamic changes to the set of available addresses, although
   mechanisms for that are being developed [20].

Should be corrected to:

   SCTP does not define how local knowledge (such as information learned
   from the link layer) should be used.  SCTP currentlyl has a mechanism
to
   deal with dynamic changes to the set of available addresses under 
   development [20].

because I know of several implementations already supporting this.  It
is just
that the IETF hasn't signed off on it.

I also wonder if a blanket statement along the lines of "Many protocols,
both
standardized in the IETF and outside of the IETF make use of keep-alives
to
trVack the live-ness of a connection or session."  Also, you might want
to consider
RSVP also, as RSVP does track path conditions at the raw IP level.  I
can supply
text if needed.

4) Generally, I could quibble with some text in section 4 - if the
authors feel
quibbling is valuable, I could send text.  Some parts could benefit from
a re-write.

Section 4.4:

   IP-layer solutions need to avoid sending packets concurrently over
   multiple paths; TCP behaves rather poorly in such circumstances.  For
   this reason it is necessary to choose a particular pair of addresses
   as the current address pair which is used until problems occur, at
   least for the same session.

I'd say that TCP (and SCTP & TCP-friendly) congestion control is based
upon a notion of 
a single path. Congestion variables are calculated on a per-path basis.
Of course,
routing can introduce path changes, so TCP congestion control has
mechanisms to cope,
but frequent changes will cause problems.  There is on-going work in
this area,
TCP Quickstart for example, and some TCP extensions, but I think that
you'd rather
point out that congestion control generally performs poorly over
multiple paths
- or even that calculating congestion control over multible paths is a
research area.


5) Section 5.1:

I found this description a bit odd:

   This process consists of three tasks.  First, it is necessary to
   track local information from lower and upper layers.  For instance,
   when link layer informs that we have no connection then we know there
   is a failure.  Nodes SHOULD employ techniques listed in Section 4.1
   and Section 4.2 to be aware of the local situation.

   Similarly, it is necessary to track remote address information from
   the peer.  For instance, the peer may inform that its currently used
   address is no longer in use.  Techniques outside the scope of this
   document are used for this, for further information see [18].

You are talking about Failure Detection consisting of 3 tasks:
1) Tracking local information
2) Tracking remote peer status
3) Verifying reachability 

However, you state that the 2nd task is outside of the scope of this
document, which
seems confusing

I'd suggest a re-write along the lines of:

    Failure detection consists of three parts: tracking local
information, tracking
    remote peer status, and finally verifying reachability.  Tracking
local information
    consists of using local information such as link layer failure in
order to
    provide input into the failure detection process. Nodes SHOULD
employ techniques 
    listed in Section 4.1 and Section 4.2 to be track the local
situation.
    It is necessary to track remote address information from
    the peer.  For instance, if the peer's currently used
    address is no longer in use, mechanism to relay that information may
be 
    useful.  Techniques defined in the SHIM6 base protocal[18] can be
used.
 
Continued in the next mail (sent later today or tomorrow AM).

John