Hi Matthijs,
thanks for the comments on the draft... see replies in line...
El 11/06/2007, a las 18:30, Matthijs Mekking escribió:
Hello,
I looked at draft-ietf-shim6-failure-detection-07, and want to make
two remarks.
1.
The draft states that "FBD works by generating REAP keepalives if the
node is receiving packets from its peer but not sending any of its own."
Suppose we have two hosts A and B, and A is sending payload packets
to B (unidirectional). B will respond with REAP keepalive messages.
But suppose a link failure occurs: A cannot receive any messages, but
is still able to send. A will continue sending payload packets
I don't think so....
A is sending data packet to B
Hence B is sending keepalives to A
Suppose there is a failure in the path that only affects the direction
from B to A
This is the scenario that you are suggesting right?
If this happens, A will no longer receive keepalive packets, so it
will detect a failure and will start the exploration procedure, after
wich a new path will be found and used
and might receive (delayed) keepalive messages that were triggered
from payload data before the link failure.
yes but at some point the delayed keepalive will be over and A will no
longer receive keepalive packet and will detect the failure..
Now the payload sent after the link failure is lost without being
noticed. This occurs because keepalives cannot be related to the
transmitted payload.
Note that basically this relates to the TSend timer. You can play with
it so that the detection is really fast (but it implies more keepalive
traffic)
There are a couple of research papers that study the behaviour of the
REAP protocol and its reaction capabilities in the different scenarios
and the result is basically a tradeoff between the keepalive frequency
and the recovery time.
This will not be a problem if you run a reliable protocol like TCP on
top of it, because in that case TCP will fetch this problem. But if
UDP is used, this remains an issue I think.
sure, but reap is a IP layer protocol and i guess it is better not to
link it to transport layer information
Example scenario: Unidirectional traffic from A to B using address
pair (A1, B1):
----------------------------------------------------------------------------------------------------------------------------------------
(1) A --> B: Payload1 (A1, B1)
(2) B --> A: Keepalive1 (B1, A1)
(3) B --> A: Keepalive1 (B1, A1)
not really... the state machine is per packet, i don't think it will
be possible to have two keepalives triggered with a single data packet
(it is possible the oposite situation, one keepalive for multiple data
packets)
Link failure (A1, B1)
my undersrtanding from the previous text was the the failure affected
only the direction from B to A...?
(1) A --> B: Payload2 (A1, B1) (will fail)
see previous comment
(1) A --> B: Keepalive1 (B1, A1)
why would A send a keepalive, since:
first, A is sending data packets
second, A has not received a data packet
I mean, keepalives are generated when a node is not generating data
packets with a high enough frequency and it is receeiving data packets
from the other end,m which is exactly the oposite situation than A in
your example
----------------------------------------------------------------------------------------------------------------------------------------
2.
How much locators will a host use? Because the Shim6 context will be
garbage collected after 5 minutes (according to
draft-ietf-shim6-proto-08) and with the proposed constant values,
only a maximum of fifteen different address pairs can be probed.
A shim6 node will collect garbage when the context is no longer being
used by any upper layer, so if there is an open socket then this 5 min
limit will not apply. Bottom line is that in the case that there is a
failure, it is likely imho that the shim6 layer will be able to know
locally that there is an upper layer that is using the shim6 state, so
the shim6 layer won't start the 5 min garbage collection timer
in particular, if the context is in epxloration state, it needs first
to give up the exploration phase and then launch the 5 min garbage
collection timer....
In reality, this number will be smaller, since the ULP will probably
time out after two minutes. So, if only two or three locators per
host are going to be used, these constants seem to work fine. With
more locators, the protocol will probably also work fine, but a host
will maintain locators that are probably not going to be probed. That
seems a bit inefficient.
as i mention above, i don't think the garbage collection timer will
limit the number of locators that can be explored.
besides the reap spec does not determines the number of locator pairs
that can be probed simoultaneously, so more aggressive behaviours that
probing a single pair at the time are acceptable in cases where many
locator pairs are availbale.
Regards, marcelo
What is your opinion about these two subjects?
Regards,
Matthijs Mekking