On Fri, 19 Aug 2005, marcelo bagnulo braun wrote:
Why must host1 detect this? Host2 could also ;).
not in a unidirectional connectivity scenario
consider the case where the failure implies that:
PrefA:Host1 -> Host2 is not working
PrefB:Host1 -> Host2 is working
Host2 -> PrefA:Host1 is working
Host2 -> PrefB:Host1 is not working
How would you cope with this case?
How important is this case?
Further, in your scenario, this was due to a local-failure near Host1.
A failure which can easily be detected locally without any need for
n^2 probing.
What's needed is:
- Host1 to detect the local failure and update the exit path to use
(and hence the source to use)
- this is achievable in multiple ways
- none of which need be in shim6
- none of which require shim6 to be aware of SAS or egress
issues
- Host2 shim6 to detect host1's valid locators have changed
- Maybe because it receives a packet from Host1 with a new
source
- Maybe because Host2's reachability probes detect PrefB
How common is this failure mode?
You want to specify that shim6 be able to work around /any/ kind of
routing failure, anywhere on any part of the internet affecting any
path between Host1 and Host2.
My gut feelings though are:
- Failures typically are near the edges
- Failures are typically bi-directional for a given path
- Uni-directional failures tend to be due to /congestion/, not
actual failures - again, typically at the edges. Congestion related
"failures" tend to be very transient/sporadic.
- Failures in the 'middle' are uncommon, and tend to affect /huge/
numbers of paths (ie there's a decent chance it will take out /all/
your paths)
- The problem of uni-directional failure on two /unrelated/ paths at
the same time is *tiny*
Hence (as a gut feeling):
- n^2 probing in shim6 is simply introducing huge expense in order to
solve a very uncommon problem
You think the tradeoff in order to achieve perfection is worth it.
I don't, I think the above is a general quality-of-internet-routing
problem. I think it's something that should and will be tackled within
the routing area, where people have been and are continuing working on
optimising routing protocols (from OSPF to BGP) to cope gracefully
with failures and restarts in order to eliminate some common scenarios
where routing-loops can occur in todays routing protocols.
I don't see a compelling reason to consider problems in internet
routing to be something shim6 needs to introduce great complexity for
in order to work around, when a simple approach (let underlying OS
routing pick the local prefix) will likely allow 99% of failures to be
detectable and worked around.
There are many other possible mechanisms.
A host could have the following default route:
default via ISP1-gateway
via ISP2-gateway