[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: delay as a metric, aggregation



Delay is a good metric as long as it is relatively stable. There is ample experience to show that delays change in time, and when the difference in two delays is comparable to the time implied by a mean queue depth, can result in route oscillation. There are ample research papers on the subject back to 1980-or-so, when delay was in fact the primary metric used in ARPANET routing. What one wants to know is that one is getting among the best service options available, but not that it is for sure the very best available option at this instant.

To clarify the point, if I have the option of a route from Canberra to Singapore directly (order of 100 ms ping rtt) and a similar route via the US west coast (order of 600 ms ping rtt), for a list of reasons one really wants the first one. If the choices are among three routes sporting ping rtts of 100..110 ms, 105..115 ms, and 150..160 ms, one would want to ensure that one is using one of the two O(100 ms) routes and not the third. But trying to nail down which of the two O(100 ms) routes to use is probably not worth the effort.

On Oct 6, 2005, at 7:15 AM, Jari Arkko wrote:

Delay is a pretty good metric. I have a feeling that packet
loss is also an important metric when we are talking about
wireless interfaces and host multihoming, but I have no
data to back that up.

I liked your idea of comparing the nominal delay to
actual delay.

The synthetic coordinates idea from Cedric was a very interesting
one as well. I'd like to play with that if it became available. But I fear
that DNS storage etc. is a deployment barrier.

Anyway, I see two bigger issue here. The first one is the division
of work between IP and transports. If we are sure that we
can develop a simple scheme that works well, it would make sense
adding to that to the IP layer. But if we find out later that
we need to take into account variations in congestion level,
packet loss, etc., we may find ourselves adding a lot of
functionality to try to mimic what transports are already
doing. An approach that does not have this drawback would
be providing feedback from transports and ULPs to the
shim6 so that when they say "unacceptable", Shim6 would
attempt to find another path. This would still leave the
problem of the exploration process finding the right
alternative with a high probability of success. I'm not sure
we know how to do that. For instance, if the transports say
that while the current path works its too small bandwidth,
we test another one, switch to that, and then find out that
also is too small. Perhaps delay can act as a metric here,
but I'd prefer to see some experimental results to understand
how well it works.

The second issue is our ambition level with load balancing
functionality in Shim6. If the ambition level is so high that
we try to get the same session over two paths, then this
impacts transports. But even if the ambition level is
running different sessions in different paths, we
would still have to deal with congestion in some manner.
Ideally, we'd be adding sessions to a path so that
the existing sessions do not have to slow down.

If the ambition level is choosing the best path,
then we are in an area where delay could work
as a metric, I think. Finally, if the ambition level
is just choosing a path that gets some packets
through, we have no problems with the measurements.
However, because many configurations have a
high-bandwidth preferred path and a low-bandwidth
backup (e.g. wireless), and returning to the better
path is a requirement, I think.

--Jari