[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
delay as a metric, aggregation
I've been thinking about the virtues and challenges of using delay as
a metric for shim load balancing.
My first thought was that delay is immaterial: other considerations,
such as bandwidth, load and packet loss are more important. But
that's not entirely true: in today's high speed networks, delay is
almost entirely the result of distance, and using a longer path
through the network when a shorter path is available wastes
resources, everything else being equal.
Also, some protocols deteriorate as delay increases, and even
standard file transfer is hampered by delay because even though most
OSes implement RFC 1323 most default settings are such that there is
no benefit, and most file transfer applications fail to override
these defaults.
So it makes sense to optimize for low delay if this can be done
reasonably. Now obviously one way to accomplish this is to measure
the delay for different address pairs. However, this has the enormous
downside that there needs to be a lot of measuring and the shim will
be activated much more often.
While (SOHO) end-users may still suffer clogged access links these
days, this is a rare condition for most hosting farms. And at 100 Mbps
+ any queuing delays are insignificant compared to common speed of
light delays. So a lot of delay information can be aggregated at the
AS level: rather than measure the delay towards service X, I can
measure the delay towards beacon servers located at X's ISPs Y and Z.
The loss of information caused by the aggregation is more than offset
by the reduction in the numbers of measurements required, especially
if X has a way to point to the Y and Z beacon servers before the
session starts, for instance, through the DNS.
The initially measured delay to/from the beacon server can also serve
as a base level to compare the actual delay of the communication
session with: as long as the session delay is a certain number of
milliseconds longer than the beacon delay, the optimum performance is
realized and there is no need to explore alternatives. However, when
the session delay deteriorates it may be useful to explore
alternative paths.
Since the delay toward a beacon is mostly determined by the distance,
there is no need to measure it before each session: the measurements
can be cached for some time.
Thougts?