[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Proxies and dead home servers



Alan DeKok <mailto:aland@nitros9.org> allegedly scribbled on Tuesday,
June 12, 2007 1:43 AM:

> Glen Zorn (gwz) wrote:
>>>   If the Home Server does not respond to a proxied Access-Request,
>>> what does the Proxy Server do with it?
>> 
>> Boringly enough, I posed this exact question thousands of years ago
>> to the (really smart) guy who developed the RADIUS server for what
>> was at the time the largest access network in the world (> 1 million
>> users). His answer is still valid, I think: nothing.
> 
>   I talked to an ISP operator a month ago.  He said that due to using
> a server with that exact behavior, he had sleepless nights
> frantically trying to allow *some* users on his network.  A home
> server had gone down, and the proxy under his control was behaving in
> the way you suggest, which led the NAS to conclude that the proxy was
> down, too. Since the proxy *also* handled requests for other home
> servers, this was a problem.     
> 
>   His solution was to temporarily update the configuration to not
> proxy *any* requests, and instead allow *all* users on the network
> without performing authentication. 
> 
>   I would like to believe such scenarios are bad.  I would like to
> avoid such scenarios by designing systems that are adaptive. 
> 
>> No.  You're confusing a "Proxy Server" with a _real_ Server.
> 
>   The NAS can tell the difference?  Really?  How?
> 
>   I think you're arguing both sides of the same coin.  You say that a
> proxy can't decide that another box is down, but also that it should
> implement failover.  

No, I'm not.  I'm saying that a proxy should _never_ failover.

> You say a NAS can't tell the difference between
> a proxy server and home server, but that they should offer different
> behaviors to the NAS.  

No, what I'm saying is that a proxy needs to be transparent to the NAS;
in order to accomplish this the behavior of its client persona needs to
be different than a normal client, i.e., _not_ failover.
  
> 
>   If what you say about proxy behavior is true, then the NAS can
> easily tell the difference between a home server and a proxy.  A home
> server follows RFC 2865, and always sends a response to a request.  

Nonsense.  It can't send a response if its down.

> A
> proxy server sometimes doesn't respond to a request.   

Reflecting the state of an upstream entity.

> 
>>  The above
>> does not apply to proxies; to see why, extend the confusion to the
>> client side of the proxy.  If the proxy is a real client, it will
>> discern that no answer is forthcoming by means of the traditional
>> method of multiple time-out and retry, right?  If the proxy clients
>> timeout period is the same as that of the NAS, it will multiply
>> identical but slightly staggered requests before giving up &
>> returning (in your scenario below) an Access-Reject slightly _after_
>> the NAS has given up & tried a different proxy (if available).
> 
>   Yes.  So?
> 
>> If the proxy client's time out
>> is too short, though, it risks returning spurious Access-Rejects due
>> to end servers that are just a little bit slow.
> 
>   Why is this a problem?  If the proxy has reasonable timeouts set,
> then: 
> 
>   a) fixed timeouts will be set large enough to handle any reasonable
>   network delay, or b) the proxy will do RTT and discover the real
>   timeouts, AND c) the user will very likely give up after 30 seconds
>   if the network is extremely slow, AND d) one or two users will be
> rejected (likely after they've given up already), because the timers
> aren't synchronized.  
> 
>   I fail to understand where the catastrophe is.
> 
>>  Basically, the time-out &
>> retry behavior in a proxied network MUST be driven by the NAS, not
>> the proxies or total chaos results.
> 
>   That's what I'm trying to agree with.
> 
>>    This is not to say that a proxy must blindly forward requests to
>> dead servers. It should certainly note the responsiveness of upstream
>> entities: if one appears to be dead & there is an alternate path
>> available, it should be used but decisions about the health of the
>> upstream entities must be based upon either passive observation or
>> possibly locally generated (& blatantly illegal ;-) probes, not
>> through the imitation of a real client
> 
>   Do you have comments on RFC 3539, which suggests precisely such
> probes? 

Just one: Diameter.

> 
>>> The only safe response is an Access-Reject, I think.
>> 
>> For reasons outlined above, the only safe response is none.
> 
>   If I have to choose between a small number of users getting
> erroneously rejected after a long period of time, OR a large number
> of users getting rejected because the NAS erroneously decides that a
> server is down, the choice to me is clear.  

The problem I'm trying to point out is that the wrong decision is being
made because the wrong things are being considered.  If NAI-based
routing is in use & routes instead of servers are marked as up or down
the problem you're talking about evaporates.
 
> 
>   Alan DeKok.

--
to unsubscribe send a message to radiusext-request@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/radiusext/>