[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: failure detection
El 17/08/2005, a las 21:38, Paul Jakma escribió:
IP is a best-effort protocol.
Reliability, etc. is a concern of the upper-layers (eg TCP).
Please explain why shim (which will, one hopes, look like fairly much
existing IP layers)) needs to reinvent functionality traditionally not
provided by IP?
SHIM wg is about providing multihoming support for IPv6. In particular
a solution for IPv6 multihoming must be able to preserve communications
through outages in the communicating path. Such functionality is
provided in IPv4 through BGP features but it requires the injection of
site routes in the interdomain routing. In IPv6 PA addressing is used,
so we need additional mechanisms (the shim) to provide equivalent
functionalities, in particular to preserve established communications
through outages.
"Ah, but shim can make use of the fact that multiple locators could be
published for an endpoint!"
Is the likely answer, explain:
1. Why this is a compelling argument given that it's been possible to
publish multiple addresses in DNS for a long long time, yet there has
been 0 demand for either applications to implement n^2 path-probing of
each local address to every remote address, or for OSes to implement
some kind of 'path-probe' shim to provide such functionality for all
applications?
I am afraid you are missing our goal here. this is not a matter of
oportunity but the way we can preserve established communication
through outages. see above
2. How this path-probing will interact with routing policy?
The local administrator may have different cost local links. In order
to express policy he may do something like set the default route to go
via the low-cost link. (the default gets changes by some mechanism
unknown to us, routing policy, script, whatever)
Along comes shim6, sending packets with every possible source it can
find on the machine, as a consequence sending packets out of expensive
links (eg dial-on-demand links, or the $LOTS/Mbyte link..)
With the shim, path are closely related to addresses used, in
particular exit paths of the multihomed site are related to source
addresses used. So in order to provide this type of features, source
address selection has to be influenced, for instance using RFC 3484
policy table
Ie: The best way to honour local policy is to use INADDR_ANY and let
the OS decide the source address by consulting local routing policy -
alternatively, an administratively specified address. Why exactly is
shim6 so different from everything else on the internet and special
that this would not work for it?
this is exactly how the shim would support policing see above
3. The traditional way on the internet to guard against path failures
is to get a routing feed (and no, that does *not* imply you advertise
anything), why is shim6 so special that it can't defer to existing
practice?
scalability. traditional IPv4 routing based multihoming lacks of it
And reread 2 again :).
4. How will you decide which path is best?
Some apps may prefer high-bandwidth/slightly lossy links over
low-bandwidth/no-loss links. Other apps completely the opposite. Once
you start picking paths, how do you know what kind of path the
application would prefer?
local policy can be expressed to some degree with the policy table
defined in RFC 3484. If more fine grained expression is needed (e.g.
per app) additional parameters need to be included in the policy table
If you simply guess, how will your guess be any better than a very
simple mechanism, eg using INADDR_ANY as source and just picking the
first locator that replies?
(And again, the traditional way for administrators to set policy on
what source address is the best to use is via routing policy. See 2
yet again).
5. Given n^2 path-probing does not scale, and could be /very/
expensive in some situations (and generally introducing complexity),
do you have statistics on the general reliability of path failures in
the internet to justify this expense and complexity?
we seem to be assuming that multihoming support is something useful and
that it will be needed in IPv6. This multihoming support seems to
require communications to be preserved through outages.
Are there any statistics as to how many path failures are due to
/local/ link failures? (which does not require n^2 path-probing to
detect).
6. If path-probing really is desired, explain why this is shim6
specific?
path exploration is a fundamental part of the shim protocol-. maybe is
not shim specific and ideas from other similar protocols can be used,
but it is imho a key part of the shim protocol and need to be part of
it.
Why could this not be done as part of a seperate programme or
protocol?
Some possibilities:
- software that does local path-probing to determine reachability of
locally attached gateways (eg IPMP in Solaris for one)
this seems to be local, while shim is defined e2e
- BFD and some other protocols in development
B means bidirectional, and we are not assuming bidirectional paths here
- software that monitors some well-known paths and adjusts local
routing to suit based on administratively defined metrics
- software that monitors the systems route-cache and does
path-probing for destinations that currently see flows
not sure what you mean by those but in any case, i am sure we can
benefit from these designs as well from the others you emntioned to
design the shim path exploration. If you are familiar with those, i am
sure that your knowledge would be very useful to help with the design
of the path exploration protocol of the shim
I could go on and on :).
Note that it's the probing using every single local address which
bothers the most. Simple heartbeats and monitoring which set of
locators are reachable and just picking one and sticking to it till
you needed to switch, I could agree with.
AFAICT this is the approach being considered here or at least one of
them. I mean, imho, we would only need to perform path exploration
after an outage
i mean, i see probing as a last resource to confirm that an outage
has occurred (and then a tool to explore alternative paths before
diverting the actual data packets)
Why exactly must this be considered as a part of shim6? This does not
seem to be a shim6 specific thing at all, for a start off.
I fail to understand what you are missing. Failure detection and Path
exploration are key components of the shim, and they are needed to
preserve established communications through outages.
As per above: picking the right /remote/ locator *is* a shim6 job - i
agree on that. It's the n^2 probing I want to ensure is /not/
considered for inclusion in shim6 RFCs, other than as something
mentioned as a possible implementation detail.
Ok, i think i see now.
Your problem is with probing with different source locators, right?
Well, this is needed because the source address determines the exit
path from the multihomed site. I mean, because we are assuming PA
addressing, changing the source address results in using a different
ISP in the multihomed site. That is why different source address need
to be explored
It's the complexity of what's being proposed which I find wrong.
so, while in multiple occasions it may not be needed and can be
skipped, i see probing as a fundamental part of the shim
If you mean probing every combination of local and remote addresses
for reachability, I really don't see how you could come to that
conclusion.
There's many many years of existing deployment of IP using systems and
applications that simply don't consider such complex probing worth it.
right, because they are not assuming the usage of multiple PA addresses
in a single host
When you include multiple PA addresses in hosts within a multihomed
site, then you find out that you need to try with different source
addresses.
may agree with this, but imho it need to be taken into account when
discussing the present topics
Sure.
Well, I'd love to see discussion of the signalling formats for shim6
btw, rather than less immediately important talk of "how could we
modify OS network stacks?" and "we could detect path failures and work
around them in ways nothing ever before has considered worth doing" :)
- and more importantly, I'd hate to see base shim6 specifications
cluttered up with this kind of stuff (which likely wont be
implemented, or wont be implemented soon in case of network stack
signalling additions).
I know several of you (marcelo, iljitsch, at least) have been thinking
about how to solve v6 multihoming for a /long/ time. I *know* you know
how to do it.
The problem is, now that the end /is/ actually in sight (an actual
IETF WG chartered to work on a /specific/ solution!), you've moved on
to considering problems /past/ shim6.
So here's your endpoint locator algorithm:
for every potential locator address for a ULID
send a control message to probe the locator including
sufficient information for the other side to setup the shim
on their side
wait for the minimum of PROBE_TIMEOUT seconds or until
you get a reply
If you got a reply, it should have enough information to setup the
shim, set it up and finish.
sort of... you still lacking DoS protection and locator security but
kind of what is being considered (with a couple of additional messages)
Otherwise signal failure to the ULP (eg the system's equivalent of
POSIX ENETUNREACH)
That's it, very simple and implementable.
Additionally, if you define shim6 to include a regular heartbeat, you
can monitor reachability. Include the locator's idea of its addresses
too, and two cookie fields (one for each side).
well, yes, but we are considering quite a few optimizations for this,
like ULP feedback and traffic monitoring also
but yes, This is in the lines of the failure detection mechanism being
considered,
You can then detect:
- which locator addresses work, if one doesn't mark it as unreachable.
- if its the current locator, just pick the next one which
is not known to be unreachable (and so on).
path exploration is more complex than that, because you need to change
the source address to change the exit ISP. remeber that we are assuming
PA addresses, and they are only routed through one of the ISPs of the
multihomed site.
- changes of locator on the /remote/ side
- further, you can detect changes:
- in advance (eg the locator can remove an address
in advance of it ceasing to accept packets on that
address, eg because of maintenance)
- faster, eg the remote side may be monitoring its
local status, if it detects a change it can just
send a heartbeat immediately with the updated
locator addresses to use
etc..
<damn, i feel a draft coming on - are there better tools than opening
a text editor?>
That's the kind of talk i want to see, about the actual nuts and bolts
of what is needed for shim6 to work - less "pie in the sky" stuff. :)
i think that most of this stuff is included in the current drafts, just
that additional complexity is considered, for instance security stuff,
unidirectional path support and cosniderations about the constraints
imposed by the usage of multiple PA addresses in the multohomed site
and ingress filtering
regards, marcelo
regards, marcelo
regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
Fortune:
Our POP server was kidnapped by a weasel.