[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: failure detection



Hi Iljitsch,

El 14/08/2005, a las 13:01, Iljitsch van Beijnum escribió:

Hi,

Somewhat to my disappointment, there were no opinions about which of the two failure detection mechanisms is better, either during the sessions or afterward on the list.


I am still trying to see the whole picture...

perhaps you could address the comments that i made a while ago in the list (i know you had these in the todo list :-), especially the first question about which end of the communication needs to discover the outage. imho this is relevant to understand which mechanism is more suitable, especially because conveying outage information from the ed that discover the outae to the end that is able to recover in a reliable fashion may be an issue (i a not sure but...)

(i comment on your new hybrid mechanism when i am more clear about this point then)


Inicio mensaje reenviado:

De: marcelo bagnulo braun <marcelo@it.uc3m.es>
Fecha: 16 de julio de 2005 17:49:02 GMT+02:00
Para: Iljitsch van Beijnum <iljitsch@muada.com>
Cc: shim6 <shim6@psg.com>
Asunto: about reachability detection draft

Hi Iljitsch,

thanks for the draft, it is very clear in its presentation

i know this is 00 version, but some comments anyway of things i consider relevant.


In section 2 it is stated that:

- In the second model, a host can only detect problems in the receiving
  direction so it must depend on the correspondent to detect problems
  in the other direction

I think it is important to consider one step further then. I mean, what can a host possibly do when he detects an outage in the incoming path? If a host detects an outage in the outgoing path, he can change the address pair that he is using to send packets and see if it solves the problem, but what can he do if he detects an outage in the incoming path? I guess that the only option would be to notify the correspondent node about the failure so that the correspondent uses an alternative path (Note that we are asuming the the case of unidirectional connectivity is possible)
So i guess that this mode needs an additional message informing the failure, which needs to be taken into account when comparing the options.


I guess it would also make sense to state how the those mechanisms behave when there are no outgoing packets? i mean, i guess that in any of the modes signaling is suppressed right?, however, i guess that the hosts assume that the address pair is reachable, right?

Additionally, i guess that there are other information that needs to be taken into account when detecting reachability, such as ICMP error messages, address deprecation, lower layers information (i know that you state that you assume that addresses are available, but what happens if an address of the currently used address pair is deprecated? or if the associated interface goes down?) i guess it would be important to deal with this cases also

Another issue that may be of interest is what happens with an address (pair) after it becomes unreachable? i mean, is it used in followings address pair explorations? is it putted in quarantine?

Later on in section 3 it is stated that:

  In its essence, address pair exploration is very simple: just send
  probes using every possible address pair, wait for something to come
  back and possibly consider the round trip time.

I guess we agree that you are oversimplifying the issue here :-)
I mean, the complexity is not only due to the amount of probe packets that are needed, but also becuase of unidirectional connectivity. I think it is very important to express such difficulty. I mean, even with two address pairs, the problem can be quite complex, because replies need to carry information, not only about the particular incoming packet, but also from other previously received probes, in order to allow the transmitter to determine if previous probes have succesfully arrived. I think it is important to describe this problem and possible approaches to deal with it.


Finally, in the security considerations section, i think that there is closely related problem that perhaps needs to be presented here that is flooding protection. I mean, the path exploration exchange can be used for identifying working address pairs but also for preventing that the shim can be used for flooding attacks. In order to enable the path exploration exchange to be used for this, you need to include some additional information in the exchange, some information that identifies the shim context, so that the receiver of a packet of the address pair exploration process can determine if this is one of its own established sessions that are being genuinely rehomed or if this is a flooding attack.

regards, marcelo





Since then, I've been thinking about ways to unify the two approaches. What I've been able to come up with is this:

1. when host A is uncertain about the reachability status of the current address pair, it sends host B a request
2. host B now sends back an answer, so host A knows that in the absence of this answer that something is wrong
3. host A may optionally request that host B send X packets at Y intervals to allow A to determine that there is reachability in the B -> A direction
4. with this request, A may also request that B suppress the additional packets when there is other traffic in the B -> A direction


Capabilities 1 and 2 are mandatory, capabilities 3 and 4 are optional.

A brain-dead implementation would simply send a probe and expect an answer every 10 seconds or so. (I'm not sure we should allow this as the entire failure detection mechanism as it adds unnecessary extra packets for all associations.)

An implementation that uses upper layer advice would only initiate a probe/reply sequence when the upper layer is unhappy.

An implementation that knows about bidirectional data flow would only initiate a probe/reply sequence when there is outgoing traffic but no incoming traffic. It would probably also ask the other side to keep sending packets when it has no payload data to transmit. The other side may or may not implement this. If it does, so much the better. If it doesn't, there will be more frequent unidirectional traffic so host A will do a probe/reply more often.

Although this increases the number of possible ways the failure detection can play out and thus makes debugging more difficult, it has the advantage that a basic impledmentation is simple, but it allows for implementers to add additional capabilities and improve failure detection algorithms without having to go through an IETF standardization cycle.

Thoughts?