[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Watchdog logic
Bernard Aboba wrote:
> "With TCP, there is a connection between client and server. The
> watchdog timer algorithm in RFC 3539 is defined per *connection*. So an
> ID has to be reserved, because the client can't open a new connection to
> test if an existing connection is still alive."
>
> While it's true that the watchdog timer algorithm is defined per connection,
> it seems like Status-Server is about determining whether a server is up
> or not. The only way to determine whether a connection is down
> is to wait for it to close (either via a RESET or timeout). RFC 3539
> attempts to detect a failure at the application layer prior to connection failure.
Ok. There are situations where a connection may be up, but the
application is unresponsive. It would be good to use the RFC 3539
method to validate the connection.
> I'm wondering what the implications would be of using two separate connections,
> one for Status-Server/Access-Accept and the other
> one for Access-Request/Access-Accept transactions.
>
> The failover logic would change, to be sure. For example,
> a connection failure on the Request/Accept connection would probably trigger
> failover, regardless of the state of the Status-Server/Accept connection.
> Also, the state of the two connections could get out of
> sync; for example, if the Request/Accept connection was quite busy,
> then the Status/Accept connection might send little or no traffic,
> which might cause middleboxes (e.g. a NAT) to lose connection state
> on the Status/Accept connection. In such a situation, you might just
> be able to bring up another Status/Accept connection rather than
> triggering failover.
I'm not sure having a separate connection for Status-Server is a good
idea.
In addition, the algorithm in 3539 appears to be focussed on keeping
the connections up... even if that means re-opening them. I'm not sure
this is a good idea. It means that spikes in traffic cause a large
number of connections to be opened... which then never close, or are
continuously re-opened. Even if there's no traffic on them.
It may be worth adding suggestions:
- TCP connections SHOULD be kept "full". i.e. used in a "most recently
used" fashion for normal RADIUS traffic.
- The RFC 3539 watchdog algorithm should be used to determine the status
of a *connection*.
- so long as one connection is alive, the server should be marked "alive".
- connections that haven't been used for T seconds (4 * RTT?) may be
pro-actively closed.
- at least one connection should remain open to determine application
responsiveness.
Alan DeKok.
--
to unsubscribe send a message to radiusext-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/radiusext/>