[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Watchdog logic



> Ok. There are situations where a connection may be up, but the
> application is unresponsive. It would be good to use the RFC 3539
> method to validate the connection.

The watchdog is based on application-layer timescales (e.g. 6 seconds), rather than connection timescales.  So it's possible for the watchdog to cause application layer failover prior to connection failure/reset.

> I'm not sure having a separate connection for Status-Server is a good
> idea.

I'm not sure either.   Separating the Status-Server traffic from the RADSEC
traffic breaks "fate sharing" which could introduce a number of bugs.

> In addition, the algorithm in 3539 appears to be focussed on keeping
> the connections up... even if that means re-opening them. I'm not sure
> this is a good idea. It means that spikes in traffic cause a large
> number of connections to be opened... which then never close, or are
> continuously re-opened. Even if there's no traffic on them.

The idea is to always have a connection "ready" for traffic, so yes, the
algorithm does keep connections up even if there is no regular traffic
(e.g. the algorithm generates watchdog traffic). 

> It may be worth adding suggestions:
>
> - TCP connections SHOULD be kept "full". i.e. used in a "most recently
> used" fashion for normal RADIUS traffic.
>
> - The RFC 3539 watchdog algorithm should be used to determine the status of a *connection*.

Not sure that the watchdog really determines connection status so much as status at the application layer. 

> - so long as one connection is alive, the server should be marked "alive".

Agreed.  But doesn't this somewhat conflict with the previous goal?

> - connections that haven't been used for T seconds (4 * RTT?) may be
> pro-actively closed.

How do you know what RTT is?  Or do you assume RTTMAX?  Since routing transients can take as long as 30 seconds to resolve, T probably would need to be significant (e.g. minutes).

> - at least one connection should remain open to determine application
> responsiveness.

Sure.