[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: REMINDER: RADEXT WG Last Call on Status Server document (part 2)

To: "radiusext@ops.ietf.org" <radiusext@ops.ietf.org>
Subject: Re: REMINDER: RADEXT WG Last Call on Status Server document (part 2)
From: Bernard Aboba <bernard_aboba@hotmail.com>
Date: Sun, 9 Aug 2009 22:09:24 -0700

Section 4.3

When a client fails over from one server to another because of a lack
of responsiveness, it SHOULD send periodic Status-Server packets to
the unresponsive server, using the timer (Tw) defined above.

[BA] Can you provide a section reference for the Tw timer?

Once three time periods have passed where Status-Server packets have
been sent and responded to, the server should be deemed responsive
and RADIUS requests may sent to it again. This determination should
be made separately for each server that the client has a relationship
with. The same algorithm should be used for both authentication and
accounting ports. The client MUST treat each destination (ip, port)
combination as a unique server for the purposes of this
determination.

The above behavior is modelled after [RFC3539] Section 3.4.1. We
note that if a reliable transport is used for RADIUS, then the
algorithms specified in [RFC3539] MUST be used in preference to the
ones given here.

[BA] I think a bit more explanation would be helpful here. Traditional "failover" algorithms have
depended on use of Access-Request packets. Unfortunately, these techniques can lead a client to failover
in circumstances where there is nothing wrong with the primary proxy. So in reading the first paragraph,
I'm unclear what is being advocated, precisely. For example, is it being advocated that a Status-Server
packet be sent to the primary proxy after a number of Access-Requests do not receive a response? That
would help determine whether the primary proxy was the issue in the first place. If it is not the issue
(e.g. primary proxy responds), then sending Status-Server packets to the primary proxy would seem
pointless.

If the primary proxy is responding to Status-Server, then the problem must be downstream, and it might
make sense for the client to continue to send Access-Requests for a while longer, under the assumption
that those downstream elements are also attempting to diagnose the problem (possibly with Status-Server),
so that in the meantime they might also fail-over, restoring end-to-end RADIUS connectivity. Based on the
algorithms presented in RFC 5080, one might come up with some recommendations on how long the client
might stick with the primary under the circumstance where it is responding to Status-Server packets.

Assuming that a failure of the primary proxy is confirmed by Status-Server, then failing over to the
secondary proxy quickly would seem to make sense. At this point, it would make sense to continue
to send Status-Server packets to the primary to figure out when it comes up again.

Section 4.4

The point that Status-Server packets are non-forwardable is quite central to the document, so it would
be a good idea to make the point earlier.

Section 4.5

4.5. Realm Routing

RADIUS servers are commonly used in an environment where Network
Access Identifiers (NAIs) are used as routing identifiers [RFC4282].
In this practice, the User-Name attribute is decorated with realm
routing information, commonly in the format of "user@realm". Since a
particular RADIUS server may act as a proxy for more than one realm,
the mechanism outlined above may be inadequate.

[BA] What mechanism? Since Status-Server packets are non-forwardable, there no concept of a destination
realm, right?

Overall, I think this section might be retitled "Limitations of Status-Server" because that is really the
main focus.

On reading this section, the thought did occur to me that in the cause where Status-Server indicated
a downstream failure that "per-realm" failover might make sense. For example, if the primary could
reach realm A but not B, there is no point in failing over all Requests to the secondary, just those
for realm B. Of course, it is also possible that a fault in a downstream node that prevented
reaching realm A (e.g. a failure in a server for realm A) might subsequently be corrected, so that
realm A might become reachable again.

Section 5

Recommend including a row for User-Name, which presumably would have all 0s in it (e.g. User-Name is not used
in Status-Server packets, right?).

What is the purpose of VSAs in Status-Server packets or Access-Responses?

Follow-Ups:
- Re: REMINDER: RADEXT WG Last Call on Status Server document (part 2)
  - From: Alan DeKok <aland@deployingradius.com>

Prev by Date: Re: REMINDER: RADEXT WG Last Call on Status Server document (part 1)
Next by Date: Review of draft-ietf-radext-dynamic-discovery-01
Previous by thread: Re: REMINDER: RADEXT WG Last Call on Status Server document (part 1)
Next by thread: Re: REMINDER: RADEXT WG Last Call on Status Server document (part 2)
Index(es):
- Date
- Thread