[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: review of draft-ietf-shim6-failure-detection-03.txt



Hi, Jari,
    As for section 6.4 (behaviour), maybe it is a better specification form
to seperate the text explaination and move them to the corresponding the
state machine section. Then reader can easily understand what the draft
says.
   And why do you think it is suitable to define the value for send timer
and alive timer as 10s and 3s?
  In my opinion, introduction of shim6 to ipv6 has add the complexity of ip
layer very much; and shim6-failure-detection is not simple. I don't know
what is the last performance.

 Best Regards, Sam Xia 

> -----Original Message-----
> From: owner-shim6@psg.com [mailto:owner-shim6@psg.com] On 
> Behalf Of Iljitsch van Beijnum
> Sent: Wednesday, June 21, 2006 6:13 PM
> To: Jari Arkko
> Cc: shim6-wg
> Subject: Re: review of draft-ietf-shim6-failure-detection-03.txt
> 
> On 20-jun-2006, at 15:45, Jari Arkko wrote:
> 
> >> I'm fairly critical of the current version but I don't 
> want to  waste 
> >> time with an in-depth review if there's going to be a new  version 
> >> within a week anyway.
> 
> > Don't hold back - fire away if you have known problems. You saw the 
> > discussion with John; I'm addressing those comments but will not 
> > rewrite other parts. So it could well be that your comments 
> are still 
> > valid even for -04.
> 
> Ok. First of all, have a look at my message from october 
> 24th, some comments are still valid.
> 
> This draft works per-context exclusively. So if there are 2, 
> 5 or 10 contexts between two hosts, this means 2, 5 or 10 
> times the amount of work is done.
> 
> I think it makes sense to see whether the mechanism can be 
> made to be more general, so that it can be used for other 
> applications as well, ranging from MIP (as indicated in the 
> applicability draft) to tunnel maintenance and maybe even 
> peer-to-peer applications. However, it's important that there 
> is fate sharing between the reachability protocol and the 
> user protocol (shim in our case). I think this can be solved 
> by having the quick reachability verification stuff (= FBD) 
> encapsulated in the user protocol, but let the full path 
> exploration be a protocol of its own or live under ICMPv6 or 
> some such. However, this introduces some issues with knowing 
> whether the same correspondent lives at different addresses.
> 
> In any event, I think just ignoring the possibility that 
> there are multiple contexts between two hosts isn't good 
> enough. The issue of avoiding "reachability probe storms" is 
> somewhat related to this, and also not addressed in the draft.
> 
> Another thing that's missing completely from this draft is a 
> discussion of how to use address pair preference information. 
> This makes it impossible to address traffic engineering needs.
> 
> I believe section 3 (related work) is at best unnecessary and 
> may even confuse the reader. The same is true of much of 
> section 4 (definitions). Implementers know which addresses 
> are usable and which aren't, at least for the most part, so 
> covering this in such detail is superfluous. Things like 
> "Both types of operational pairs could be discovered and 
> monitored through the following mechanisms:" have no place in 
> a protocol specification document, in my opinion. These need 
> to specify what IS, not what COULD be.
> 
> I suggest tightening the use of words like "operational", 
> "work", "reachable". They're mostly used interchangably in the draft.
> 
> 4.5.  Miscellaneous
> 
>     Addresses can become deprecated [3].  When other operational
>     addresses exist, nodes generally wish to move their communications
>     away from the deprecated addresses.
> 
>     Similarly, IPv6 source address selection [6] may guide 
> the selection
>     of a particular source address - destination address pair.
> 
> This doesn't say what shim6 implementers should do. In my opinion:  
> keep using deprecated addresses as the ULID/primary locator 
> as long as possible, but prefer non-deprecated addresses when 
> selecting alternative locators.
> 
> "A "path" is defined as" should probably be in the "definitions"  
> section.
> 
> "when link layer" -> when a link layer
> 
>     1.  The base timing unit for this mechanism is named Keepalive
>         Timeout.  Until a negotiation mechanism to negotiate different
>         values for this timer becomes available, the value (3 seconds)
>         specified in Section 6.5 SHOULD be used.
> 
> Sending a keepalive every 3 seconds? That's way too often, IMO.
> 
>     2.  Whenever outgoing data packets are generated
> 
> Data packets as opposed to what other types of packets?
> 
>     4.  The reception of a REAP keepalive packet leads to stopping the
>         timer associated with the return traffic from the peer.
> 
> So when we receive a keepalive from the other side, _we_ stop 
> sending keepalives? This may be the right thing to do, but 
> it's not obvious to me why. Some explanation would help.
> 
> What if an attacker generates spurious keepalives in order to 
> get us to stop sending keepalives on our side?
> 
>     6.  Send Timeout seconds (10 s; see Section 6.5) after the
>         transmission of a data packet with no return traffic on this
>         context, a full reachability exploration is started.  This
>         timeout period is larger than the Keepalive Timeout to
>         accommodate for lost keepalives and regular 
> variations in round
>         trip times.
> 
> The keepalives are sent at an interval of 3 seconds (or 
> shorter, I imagine that an implementation isn't going to keep 
> an exact timer for each context, any rounding must obviously 
> be in the down direction) and the timeout is 10 seconds. In 
> these 10 seconds you'd normally receive 3 keepalives, while 1 
> is enough to indicate that the other side is still alive. The 
> other 2 are only there in case of packet loss. I think that's 
> excessive. Starting the full reachability exploration because 
> of incidental packet loss isn't such a big deal that it 
> warrants sending three times as many packets as necessary.  
> (Note that this assumes the worst case of a unidirectional 
> transport protocol which isn't exactly the common case.)
> 
> I find it jarring that section 5.5 immediately jumps to 
> examples, before the protocol is fleshed out. Although the 
> examples are referred by using a number, they aren't actually 
> numbered.
> 
> Why would a keepalive need an id field?
> 
> In the first long example B times out once, but then doesn't 
> send more probes until it sees return traffic from A. Why 
> does B stop sending probes?
> 
> I believe that since the id of the last received probe is 
> included, the iseeyou flag is unnecessary.
> 
> Although copying back the last seen id seems to do the job, I 
> can't help but feel that it would be preferable to add timers 
> to reach round trip times and copy back more received ids and 
> also sent ids.  
> This allows the receiver of a probe to determine which of the 
> probes that made it to the other side did so faster, so it 
> can select the address pair with the shortest round trip time.
> 
> Including sent ids along with the addresses the probe with 
> that id was sent to helps the receiver determine that some 
> probes didn't make it (yet). If a probe didn't work in one 
> direction of an address pair, it's reasonable to assume that 
> it may also not work in the other direction and try other pairs first.
> 
> I believe the next example (the fifth, I think?) is flawed 
> because A doesn't send anything after it has sent a data 
> packet, even though it never receives a return packet. The 
> same is true for the example after that.
> 
> When does address pair exploration conclude, both in the 
> cases where there is alternative reachability, and in the 
> case when there appears to be no reachability at all? The 
> exponential backoff isn't described.
> 
> The keepalive is a fairly long packet. I think just a shim 
> header as would be used for data packets but with no ULP 
> following the shim header would be sufficient.
> 
> Requiring random numbers in packets that are sent rather 
> frequently is a bad idea, because it depletes the typically 
> limited amount of entropy that's available for strong random 
> number generation rather quickly and semi-random number 
> generation may be somewhat expensive (and not that good). And 
> I don't see what good an id does in a keepalive anyway... 
> Also, there may be reasons to have non-random numbers, such 
> as ease of lookup.
> 
> The description of the different messages repeats text that 
> is the same for different messages, which is very tedious and 
> makes it likely that people will miss the things that are 
> actually different.  
> (That's why programmer's keyboards don't have ctrl-c/ctrl-v keys:  
> reuse code, don't copy it.)
> 
> The use of options for mandatory fields is awkward.
> 
> "   The node maintains also the Send and Keepalive timers."
> 
> These timers were previously unnamed, it would be better to 
> use their names as soon as they're introduced.
> 
> "   Upon the reception of a payload packet in the Operational 
> state, the
>     node starts the Keepalive timer if it is not yet running, 
> and stops
>     the Send timer if it was running.  If the node is in the Exploring
>     state it transitions to the ExploringOK state, sends a 
> Probe message
>     with the I See You flag set to 1 (Yes), and starts the Send timer.
>     In the ExploringOK state the node stops the Send timer if it was
>     running, but does not do anything else."
> 
> I don't have a good feeling about this... It's too hard to 
> determine what should be happening. Maybe it would be better 
> rather than go down the list of packets that are 
> sent/received and describe the behavior in each state, to 
> take one state at a time and describe what happens with 
> packets in that state.
> 
>     Upon a timeout on the Keepalive timer the node sends a Keepalive
>     message.  This can only happen in the Operational state.
> 
> Why?
> 
> Why are there different Exploring and ExploringOK states? In 
> both cases, the host needs to continue trying different 
> addresses, it's mostly/only the other side that needs to 
> behave differently when there is successful reachability from 
> them to us but not (yet) from us to them.
> 
>     Garbage collection of SHIM6 contexts terminates contexts that are
>     either unused or have failed due to the inability of the 
> exploration
>     process to find a working pair.
> 
> How is the latter determined?
> 
> This is an important issue. For instance, once that a shim 
> context is rehomed, will it ever return to using the primary locators?
> 
>     In the PDF version of this specification, an informational drawing
>     illustrates the state machine.  Where the text and the drawing
>     differ, the text takes precedence.
> 
> I'm not reviewing the PDF state machine here as the text is normative.
> 
>     A tabular representation of the state machine is shown 
> below.  Like
>     the drawing, this representation is only informational.
> 
> Then I'm ignoring this too.
> 
> But I would be happier if they'd be removed, because either 
> they're superfluous as they're not normative, or they're 
> actually necessary to understand the protocol, which is even 
> worse because they're not part of the normative text.
>