[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Watchdog logic

To: Bernard Aboba <bernard_aboba@hotmail.com>
Subject: Re: Watchdog logic
From: Alan DeKok <aland@deployingradius.com>
Date: Thu, 18 Dec 2008 11:38:21 +0100
Cc: "radiusext@ops.ietf.org" <radiusext@ops.ietf.org>
In-reply-to: <BLU137-W393F844D0902CA849B54CD93F20@phx.gbl>
References: <BLU137-W393F844D0902CA849B54CD93F20@phx.gbl>
User-agent: Thunderbird 2.0.0.18 (Macintosh/20081105)

Bernard Aboba wrote:
> "With TCP, there is a connection between client and server.  The
> watchdog timer algorithm in RFC 3539 is defined per *connection*.  So an
> ID has to be reserved, because the client can't open a new connection to
> test if an existing connection is still alive."
> 
> While it's true that the watchdog timer algorithm is defined per connection,
> it seems like Status-Server is about determining whether a server is up
> or not.  The only way to determine whether a connection is down
> is to wait for it to close (either via a RESET or timeout).  RFC 3539
> attempts to detect a failure at the application layer prior to connection failure. 

  Ok.  There are situations where a connection may be up, but the
application is unresponsive.  It would be good to use the RFC 3539
method to validate the connection.

> I'm wondering what the implications would be of using two separate connections, 
> one for Status-Server/Access-Accept and the other 
> one for Access-Request/Access-Accept transactions. 
> 
> The failover logic would change, to be sure.  For example, 
> a connection failure on the Request/Accept connection would probably trigger
> failover, regardless of the state of the Status-Server/Accept connection. 
> Also, the state of the two connections could get out of 
> sync; for example, if the Request/Accept connection was quite busy,
> then the Status/Accept connection might send little or no traffic,
> which might cause middleboxes (e.g. a NAT) to lose connection state
> on the Status/Accept connection.  In such a situation, you might just
> be able to bring up another Status/Accept connection rather than 
> triggering failover. 

  I'm not sure having a separate connection for Status-Server is a good
idea.

  In addition, the algorithm in 3539 appears to be focussed on keeping
the connections up... even if that means re-opening them.  I'm not sure
this is a good idea.  It means that spikes in traffic cause a large
number of connections to be opened... which then never close, or are
continuously re-opened.  Even if there's no traffic on them.


  It may be worth adding suggestions:

- TCP connections SHOULD be kept "full".  i.e. used in a "most recently
used" fashion for normal RADIUS traffic.

- The RFC 3539 watchdog algorithm should be used to determine the status
of a *connection*.

- so long as one connection is alive, the server should be marked "alive".

- connections that haven't been used for T seconds (4 * RTT?) may be
pro-actively closed.

- at least one connection should remain open to determine application
responsiveness.

  Alan DeKok.

--
to unsubscribe send a message to radiusext-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/radiusext/>

Follow-Ups:
- RE: Watchdog logic
  - From: Bernard Aboba <bernard_aboba@hotmail.com>

References:
- Watchdog logic
  - From: Bernard Aboba <bernard_aboba@hotmail.com>

Prev by Date: Watchdog logic
Next by Date: RE: Watchdog logic
Previous by thread: Watchdog logic
Next by thread: RE: Watchdog logic
Index(es):
- Date
- Thread