[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TSV-DIR Review of draft-ietf-shim6-protocol-09.txt



On 22 nov 2007, at 17:17, Bernard Aboba wrote:

However, this relationship is not explored. The TCP keepalive interval
is generally kept quite large, partly out of a desire not to tear down
idle TCP connections due to a transient failure.  The SHIM6 keepalive
interval during idle is not defined in the Failure Detection document,
but my impression was that it could be much shorter and this would
seem to collide with the philosophy of TCP keepalives.

Shim6 REAP keepalives aren't sent when a shim6 context is idle.

The REAP protocol assumes that traffic must always be bidirectional, so when there has been outgoing traffic but no incoming traffic, there must be a failure. Keepalives exist to accommodate the cases where there is legitimately only incoming traffic but no return traffic. When traffic is flowing in both directions or when there is no traffic, there is no need to send keeplives.

The SCTP algorithms make extensive use
of transport layer information such as retransmission counts, which
the SHIM6 Failure Detection document seems to assume will be unavailable.

Right. Shim6 must work for all kinds of communication. However, it would be good to make use of transport protocol knowledge when available. You feel there are missed opportunities in this area?

In general, it would not be desirable for SHIM6 to initiate the re- homing
of a TCP connection due to a transient failure.  Link layer "down"
indications or resulting address deprecations are examples of this.

The trouble is, how do you know a problem is transient?

About address deprecation: I do seem to remember a discussion where the conclusion was that deprecation is no reason to stop using an address just because it's deprecated. Telling the other end that an address should no longer be used when it's deprecated would have that effect, so if the proto document mandates that, that could be problematic.

(One scenario is a router that no longer sends RAs but still continues to route, it would be possible to use the addresses after they've become deprecated until they become invalid in this case.)

6.  Interactions of SHIM6 with congestion control.  Section 4.3 of the
Failure Detection document talks about exploration timeout values.
Exploration can be kicked off if no inbound traffic is
received within Send Timeout (default = 10 seconds).

The first observation is that the Send Timeout should probably depend
on the RTO estimate, as it does in SCTP.  Otherwise we could have a
network with a high RTO and SHIM6 exploration could commence after RTO is backed off only a few times. This would be undesirable from a congestion
control point of view.

We need the timeout to be somewhat long to accommodate the case where a host receives a packet, then does processing and finally sends an answer. However, it also needs to be fairly short so that we have time to repair a failure before the user, application or transport protocol give up. I don't think alignment with the transport's retransmission timeout makes sense here.

The suggested value of the Initial Probe Timeout (500ms)
is less than RTOmin and 4 probes can be sent before initiating
exponential backoff.  This seems like it could violate "conservation
of packets".  Why doesn't exponential backoff begin immediately?

Then you'd either have to send the first few probes in quick succession without leaving a reasonable amount of time for responses to come back, or it would take very long for the first 5 or so probes to go out. 500 ms is still relatively aggressive as it's well below the maximum observed RTTs on the internet.

Iljitsch