Iljitsh,
Avoiding non-working paths goes to the core of what's multihoming all
about, so this is something we absolutely need. But I'm not sure it's
all we need. What I'm worried about is the situation where there are
multiple working paths, but there is a large difference in quality
between them. For instance, one path has a capacity of 100 Mbps and
another a capacity of 1 Mbps. In this case, selecting the latter path
as part of regular operation would hardly be acceptable.
We _may_ be able to bring this problem back to manageable proportions
by taking advantage of the information both ends have. So if either end
knows that the second path is much slower, they'll avoid it as long as
the first path is operational. This leaves only the cases where there
is congestion in the core, which is pretty rare these days. (But this
hasn't always been the case...)
So, it seems like the problem might be moving into a general path selection
problem, which can be used for many things than just multihoming. I'm
not completely opposed to this, but want to warn that there are a lot of
details that will not be so simple. Perhaps we need to scope this problem
pretty well before going down this path.
In terms of 'best' path, I can think of lots of factors - latency, bandwidth,
most active, least active, lowest cost and so on. Possibly selecting
a path based on the various combinations of the above for 2 endpoints
seems fairly challenging, so I'm not so optimistic that this is a solvable
problem in the short term.
Note that the whole reachability detection problem isn't as simple as
it may seem at first, especially when we want to make use of
unidirectional paths. Until not very long ago, I was favoring a
"cartesian ping bomb" approach where there are reachability probes for
all { src, dst } combinations. However, this can cause massive
congestion when a link fails for a busy site, and it's also complex and
most likely slow.
Agreed. Even if you did this, then you would need some sort of mechanism
to evaluate the paths after determining reachability. I know some people
have discussed a next generation trace route that would collect statistics
about the hops along a particular path. Put that onto your "cartesian ping bomb"
and you might have a solution, but at what cost?