[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Failure Detection (was Re: soft state (was Re: shim6 and bit errors in data packet headers



...


If we have positive ACK from the ULP, we should be fine and do nothing, even if the TX and RX are detecting a failure)

If we only receive and don't send but the ULP is happy we still need to send keepalives because we can't know that the ULP at the other side is also providing positive feedback.



right, we still have to send keepalives

If we only send and don't receive but the ULP is happy (how does it know it can be happy??) I guess we can go either way without much harm.

So basically positive feedback wouldn't (have to) change our behavior.

Rx and Tx are to be used when there is no info from ULP
The reasons for this is that ULP really know for sure if communication is progressing adequately. Tx and Rx are just guessing. For instance, Rx could be fooled by an attacker injecting packets with spoofed source address, in order to keep the communication in a given path.

That's true.

On the other hand, I wouldn't necessarily put too much trust in what weird ULPs have to say. But as long as what they have to say can only help or hurt themselves I don't really care, of course.


that brings out an interesting issue: what if we have multiple ULPs using the same session and they provide different feedback?


For instance, a simple case would be that some apps are more sensitive than others, so they will complain sooner. More complex cases could be that one app complains and the other one provides positive feedback (suppose that the failure is on the app level and not in the path for instance) how do we deal with this?

IMHO, ULP feedback should result in an explicity reachabililty test on the current locator pair i.e. ULP feedback does not directly implies rehoming, but in a verification though a reachability test exchange of the current locator pair.

Actually, imho all the info about failure should be considered as hints, that imply a reachability test exchange to verify that the apth is working or not.

So, as i understand it we would have:

Failure detection hints including:
- ULP negative feedback
- Tx>0 and Rx=0
- Receive a reachability test exchange from the peer (do we still need this?)
- ICMP error
- SHIM error ¿?


As the result of any of these hints, a reachability test exchange is performed using the current locator pair

If reachability test succeed then keep on using the current locator pair
If reachability test fails, the start alternative locator pair exploration process


When one alternative locator pair that is working is found, then rehome the communication

Did i miss something?

Regards, marcelo


Iljitsch