[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Watchdog logic

To: Alan DeKok <aland@deployingradius.com>
Subject: RE: Watchdog logic
From: Bernard Aboba <bernard_aboba@hotmail.com>
Date: Thu, 18 Dec 2008 07:33:25 -0800
Cc: "radiusext@ops.ietf.org" <radiusext@ops.ietf.org>
In-reply-to: <494A281D.4000301@deployingradius.com>
References: <BLU137-W393F844D0902CA849B54CD93F20@phx.gbl> <494A281D.4000301@deployingradius.com>

> Ok. There are situations where a connection may be up, but the
> application is unresponsive. It would be good to use the RFC 3539
> method to validate the connection.

The watchdog is based on application-layer timescales (e.g. 6 seconds), rather than connection timescales. So it's possible for the watchdog to cause application layer failover prior to connection failure/reset.

> I'm not sure having a separate connection for Status-Server is a good
> idea.

I'm not sure either. Separating the Status-Server traffic from the RADSEC
traffic breaks "fate sharing" which could introduce a number of bugs.

> In addition, the algorithm in 3539 appears to be focussed on keeping
> the connections up... even if that means re-opening them. I'm not sure
> this is a good idea. It means that spikes in traffic cause a large
> number of connections to be opened... which then never close, or are
> continuously re-opened. Even if there's no traffic on them.

The idea is to always have a connection "ready" for traffic, so yes, the
algorithm does keep connections up even if there is no regular traffic
(e.g. the algorithm generates watchdog traffic).

> It may be worth adding suggestions:
>
> - TCP connections SHOULD be kept "full". i.e. used in a "most recently
> used" fashion for normal RADIUS traffic.
>
> - The RFC 3539 watchdog algorithm should be used to determine the status of a *connection*.

Not sure that the watchdog really determines connection status so much as status at the application layer.

> - so long as one connection is alive, the server should be marked "alive".

Agreed. But doesn't this somewhat conflict with the previous goal?

> - connections that haven't been used for T seconds (4 * RTT?) may be
> pro-actively closed.

How do you know what RTT is? Or do you assume RTTMAX? Since routing transients can take as long as 30 seconds to resolve, T probably would need to be significant (e.g. minutes).

> - at least one connection should remain open to determine application
> responsiveness.

Sure.

References:
- Watchdog logic
  - From: Bernard Aboba <bernard_aboba@hotmail.com>
- Re: Watchdog logic
  - From: Alan DeKok <aland@deployingradius.com>

Prev by Date: Re: Watchdog logic
Next by Date: Inner identities, privacy, and roaming termination
Previous by thread: Re: Watchdog logic
Next by thread: Re: Watchdog logic
Index(es):
- Date
- Thread