[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some small remarks for REAP



Hi,

How unfortunate: I explained the unidirectional flow incorrectly: I meant: A may still receive messages, but its payload will not be delivered to B. Anyway, I think I misunderstood the multiple keepalives part. I thought that receiving one payload packet triggered the replying of multiple keepalive packets.
If so, than I think the first remark is no issue.

Regards,

Matthijs Mekking

marcelo bagnulo braun schreef:
Hi Matthijs,

thanks for the comments on the draft... see replies in line...

El 11/06/2007, a las 18:30, Matthijs Mekking escribió:

Hello,

I looked at draft-ietf-shim6-failure-detection-07, and want to make two remarks.

1.
The draft states that "FBD works by generating REAP keepalives if the node is receiving packets from its peer but not sending any of its own." Suppose we have two hosts A and B, and A is sending payload packets to B (unidirectional). B will respond with REAP keepalive messages. But suppose a link failure occurs: A cannot receive any messages, but is still able to send. A will continue sending payload packets

I don't think so....

A is sending data packet to B
Hence B is sending keepalives to A

Suppose there is a failure in the path that only affects the direction from B to A

This is the scenario that you are suggesting right?

If this happens, A will no longer receive keepalive packets, so it will detect a failure and will start the exploration procedure, after wich a new path will be found and used




and might receive (delayed) keepalive messages that were triggered from payload data before the link failure.


yes but at some point the delayed keepalive will be over and A will no longer receive keepalive packet and will detect the failure..

Now the payload sent after the link failure is lost without being noticed. This occurs because keepalives cannot be related to the transmitted payload.


Note that basically this relates to the TSend timer. You can play with it so that the detection is really fast (but it implies more keepalive traffic)

There are a couple of research papers that study the behaviour of the REAP protocol and its reaction capabilities in the different scenarios and the result is basically a tradeoff between the keepalive frequency and the recovery time.

This will not be a problem if you run a reliable protocol like TCP on top of it, because in that case TCP will fetch this problem. But if UDP is used, this remains an issue I think.


sure, but reap is a IP layer protocol and i guess it is better not to link it to transport layer information

Example scenario: Unidirectional traffic from A to B using address pair (A1, B1): ----------------------------------------------------------------------------------------------------------------------------------------
(1) A --> B: Payload1 (A1, B1)
(2) B --> A: Keepalive1 (B1, A1)
(3) B --> A: Keepalive1 (B1, A1)


not really... the state machine is per packet, i don't think it will be possible to have two keepalives triggered with a single data packet (it is possible the oposite situation, one keepalive for multiple data packets)

Link failure (A1, B1)


my undersrtanding from the previous text was the the failure affected only the direction from B to A...?

(1) A --> B: Payload2 (A1, B1) (will fail)

see previous comment

(1) A --> B: Keepalive1 (B1, A1)

why would A send a keepalive, since:
first, A is sending data packets
second, A has not received a data packet

I mean, keepalives are generated when a node is not generating data packets with a high enough frequency and it is receeiving data packets from the other end,m which is exactly the oposite situation than A in your example

----------------------------------------------------------------------------------------------------------------------------------------


2.
How much locators will a host use? Because the Shim6 context will be garbage collected after 5 minutes (according to draft-ietf-shim6-proto-08) and with the proposed constant values, only a maximum of fifteen different address pairs can be probed.


A shim6 node will collect garbage when the context is no longer being used by any upper layer, so if there is an open socket then this 5 min limit will not apply. Bottom line is that in the case that there is a failure, it is likely imho that the shim6 layer will be able to know locally that there is an upper layer that is using the shim6 state, so the shim6 layer won't start the 5 min garbage collection timer

in particular, if the context is in epxloration state, it needs first to give up the exploration phase and then launch the 5 min garbage collection timer....

In reality, this number will be smaller, since the ULP will probably time out after two minutes. So, if only two or three locators per host are going to be used, these constants seem to work fine. With more locators, the protocol will probably also work fine, but a host will maintain locators that are probably not going to be probed. That seems a bit inefficient.


as i mention above, i don't think the garbage collection timer will limit the number of locators that can be explored.

besides the reap spec does not determines the number of locator pairs that can be probed simoultaneously, so more aggressive behaviours that probing a single pair at the time are acceptable in cases where many locator pairs are availbale.

Regards, marcelo


What is your opinion about these two subjects?



Regards,

Matthijs Mekking