[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: review of draft-ietf-shim6-failure-detection-03.txt



Continuing my response... unfortunately
what we find below is not in the -05 that
was submitted in before the deadline. Inline:

> I believe the next example (the fifth, I think?) is flawed because A 
> doesn't send anything after it has sent a data packet, even though it 
> never receives a return packet. The same is true for the example 
> after that.

You are correct. The examples have been fixed in -06.

> When does address pair exploration conclude, both in the cases where 
> there is alternative reachability, and in the case when there appears 
> to be no reachability at all? The exponential backoff isn't described.

The exponential backoff description was added to -05.
The draft still states that exploration does not conclude; its
allowed to keep on sending probes at a very low rate.

I think its easier to deal with the termination using garbage
collection than by some other means.

> The keepalive is a fairly long packet. I think just a shim header as 
> would be used for data packets but with no ULP following the shim 
> header would be sufficient.

From what I can see, we add 20 bytes on top of the IPv6 header. This
could be reduced, of course, but I'm currently unconvinced that
we need to worry about this optimization. Keepalives are only sent
when you have one-way communication. In any case any reduction
could not exceed 33%, given size of the IP header.

> Requiring random numbers in packets that are sent rather frequently 
> is a bad idea, because it depletes the typically limited amount of 
> entropy that's available for strong random number generation rather 
> quickly and semi-random number generation may be somewhat expensive 
> (and not that good). And I don't see what good an id does in a 
> keepalive anyway... Also, there may be reasons to have non-random 
> numbers, such as ease of lookup.

Well, we could remove the payload/keepalive reporting from the protocol.
It would in some cases need a few more messages, but it would likely
be simpler, and take a few bytes out of frequently needed messages.
Do others have an opinion here?

> The description of the different messages repeats text that is the 
> same for different messages, which is very tedious and makes it 
> likely that people will miss the things that are actually different. 
> (That's why programmer's keyboards don't have ctrl-c/ctrl-v keys: 
> reuse code, don't copy it.)

Fixed. More reduction may come if we remove identifiers from
keepalives, and simplify the option structure.

> The use of options for mandatory fields is awkward.

I agree. This is initially part of the generality design, but
its painful to keep it around. I'd be happy to move things
back into the messages, unless its something that appears
multiple times like a reception report.

> "   The node maintains also the Send and Keepalive timers."
>
> These timers were previously unnamed, it would be better to use their 
> names as soon as they're introduced.

Fixed.

>
> "   Upon the reception of a payload packet in the Operational state, the
>    node starts the Keepalive timer if it is not yet running, and stops
>    the Send timer if it was running.  If the node is in the Exploring
>    state it transitions to the ExploringOK state, sends a Probe message
>    with the I See You flag set to 1 (Yes), and starts the Send timer.
>    In the ExploringOK state the node stops the Send timer if it was
>    running, but does not do anything else."
>
> I don't have a good feeling about this... It's too hard to determine 
> what should be happening. Maybe it would be better rather than go 
> down the list of packets that are sent/received and describe the 
> behavior in each state, to take one state at a time and describe what 
> happens with packets in that state.

I believe this detailed list of packets vs. each state is in the draft
(in the informational part). Its now intermixed with the explanation,
so it may be easier to read.

Or are you questioning the wiseness of taking an action
based on the reception of a payload packet? We could also
define the protocol solely based on the signaling packets.
But with data packets it may potentially converge faster, after
a transient failure, for instance.

>    Upon a timeout on the Keepalive timer the node sends a Keepalive
>    message.  This can only happen in the Operational state.
>
> Why?

Because the timer is otherwise not running. Therefore,
it can't fire in other states.

> Why are there different Exploring and ExploringOK states? In both 
> cases, the host needs to continue trying different addresses, it's 
> mostly/only the other side that needs to behave differently when 
> there is successful reachability from them to us but not (yet) from 
> us to them.

Good question. The current description requires running the
Send timer in the ExploringOK state, but frankly I'm unsure
why this was put there, it doesn't seem to be required. And
the termination of the exploration really happens based
on the peer's event reports referring to your event reports.

The main difference right now is that retransmission use
iseeyou=yes vs. iseeyou=no in the different states. This
could perhaps be modelled in other ways too.

>    Garbage collection of SHIM6 contexts terminates contexts that are
>    either unused or have failed due to the inability of the exploration
>    process to find a working pair.
>
> How is the latter determined?
>
> This is an important issue. For instance, once that a shim context is 
> rehomed, will it ever return to using the primary locators?

I think the text is wrong. We can't really determine the inability
in any reasonable way, as the situation may change so that even
if we have tried all pairs, trying again may find that the path works
again.

I have changed this text to say that the garbage collection works
solely based on usage and age.

>    In the PDF version of this specification, an informational drawing
>    illustrates the state machine.  Where the text and the drawing
>    differ, the text takes precedence.
>
> I'm not reviewing the PDF state machine here as the text is normative.
>
>    A tabular representation of the state machine is shown below.  Like
>    the drawing, this representation is only informational.
>
> Then I'm ignoring this too.
>
> But I would be happier if they'd be removed, because either they're 
> superfluous as they're not normative, or they're actually necessary 
> to understand the protocol, which is even worse because they're not 
> part of the normative text.

The tabular representation has been very useful for me when trying
to ensure that we have covered all combinations of states and
events. And when I program, I like to take a state machine from
a spec and work based on that, but YMMV.

The picture OTOH is something that people may find it easier
to start with.

--Jari