[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: failure detection
Hi Paul,
My understanding is that you seem to be uncomfortable with the fact
that n^2 probes may be needed when we try with different source and
destination locators, right?
Ok, i am uncomfortable too with this, but i fail to see any other
option for dealing with this.
Let me explain why:
Suppose you have a multihomed site with ISPA and ISPB and that they
have assigned PrefA and PrefB respectivelly
Suppose you have Host1 in the multihomed host and that it is
communicating with Host2 outside the multihomed site. For that
communication, Host1 is using address PrefA:host1 both as locator and
as ULID. I assume that host2 has a single address host1
Now, suppose that an outage in the link between the multihomed site and
ISPA occurs.
Host1 detects and needs to do something about it, how can he try with
an alternative path? Well, it needs to retry using an alternative
source address, so that packets can be routed through ISPB (in the
outgoing direction this is due to ingress filtering compatibility and
in the incoming direction becuase of the usage of PA addressing)
This implies that when a host within a multihomed site needs to try
alternative paths, it needs to use different source addreses, and of
course different destiantion addresses in a more general scenario.
This implies that in order to explore all the possible paths, we need
to make n^2 probes.
Now, it is important to realize that n^2 is just an upper bound, and
that n^2 probes will only be performed when all paths have failed
except one and this is the last one you have tried with (which may
occur very often according to Murphy's law :-)
One of the main concerns of the people designing this mechanisms is how
to achieve clever mechanisms to reduce as much as possible the number
of probes. The idea is not to send the n^2 probes at once, but to
perform some form of exploration phases in which different combinations
are tried.
Ok, i will move on to some comments you have below...
El 18/08/2005, a las 12:14, Paul Jakma escribió:
1. Why this is a compelling argument given that it's been possible
to publish multiple addresses in DNS for a long long time, yet there
has been 0 demand for either applications to implement n^2
path-probing of each local address to every remote address, or for
OSes to implement some kind of 'path-probe' shim to provide such
functionality for all applications?
I am afraid you are missing our goal here. this is not a matter of
oportunity but the way we can preserve established communication
through outages. see above
But you can preserve comms without n^2 probing.
sure, n^2 is just the worst case
With the shim, path are closely related to addresses used, in
particular exit paths of the multihomed site are related to source
addresses used.
Agreed.
So in order to provide this type of features, source address
selection has to be influenced, for instance using RFC 3484 policy
table
So in order to determine the source, you're saying the table which
affects source selection has to be influenced by shim6.
no, i am saying that source address selection is influenced by RFC3484
policy table and that this table is the right place to express policy.
In addition, that SHIM can honor this table as much as possible, so
that policy can be expressed when using the shim
Surely that's like saying "shim6 will use whatever source pleases it"
(ie simply ignoring SAS for the final shim6 output packet).
no, shim can try to use first the addresses as expressed in the policy
table
Ie: The best way to honour local policy is to use INADDR_ANY and let
the OS decide the source address by consulting local routing policy
- alternatively, an administratively specified address. Why exactly
is shim6 so different from everything else on the internet and
special that this would not work for it?
this is exactly how the shim would support policing see above
No, you said shim6 likely will have to influence SAS. Thats quite
different from not worrying about SAS at all and letting the OS decide
according to its local policy.
No, shim will try to honor the policy table (or any other tool to
express policy we need to define) Obviously, if the path preffered by
the policy is not available, then shim will have to use others of
course
3. The traditional way on the internet to guard against path
failures is to get a routing feed (and no, that does *not* imply you
advertise anything), why is shim6 so special that it can't defer to
existing practice?
scalability. traditional IPv4 routing based multihoming lacks of it
Note again "does *not* imply you advertise anything". Everyone is
(sort of) agreed "multihome by advertising your prefix" doesn't scale
and should not be considered for IPv6 - hence shim6.
I'm saying you can still get a "read-only" routing feed (BGP or
whatever), purely for informational purposes, to help decide which of
your ISPs has the best path.
That's entirely scaleable and well within reason for deployment at
'enterprise' shim6 sites.
agree, but this is not enough to preserve established communications
Are you considering the case that full BGP feed is injected to the
hosts, so that the hosts can find out which path is available towards
its final destination?
4. How will you decide which path is best?
local policy can be expressed to some degree with the policy table
defined in RFC 3484. If more fine grained expression is needed (e.g.
per app) additional parameters need to be included in the policy
table
Sorry, my question was more about the metrics you will use to
determine whether path (local A, remote B) is better than (local C,
remote D). Eg, RTT, packet loss, etc.
this is an ongoing discussinon in this list right now
we seem to be assuming that multihoming support is something useful
and that it will be needed in IPv6.
I'd agree with that assumption. :)
This multihoming support seems to require communications to be
preserved through outages.
Agreed, but I still don't understand why this requires shim6 to
specify overriding existing SAS policy and do n^2 probing.
i addressed this point above
6. If path-probing really is desired, explain why this is shim6
specific?
path exploration is a fundamental part of the shim protocol-. maybe
is not shim specific and ideas from other similar protocols can be
used, but it is imho a key part of the shim protocol and need to be
part of it.
If it is not specific to shim6, why should it be solved in shim6? (In
the 'for every possible path of the combinations of local and remote
addresses')
when i said that is not specific i meant that other protocols may
require similar procedures, but it is a key component of the shim
protocol, so that is why it needs to be performed.
...
Note that it's the probing using every single local address which
bothers the most. Simple heartbeats and monitoring which set of
locators are reachable and just picking one and sticking to it till
you needed to switch, I could agree with.
AFAICT this is the approach being considered here or at least one of
them.
One of them yes, the n^2 probing is one of the options being
considered - I hope to quash its further development :) - at least in
terms of something that is mandated officially as part of shim6 specs
(can still be mentioned in some kind of implementors note).
I mean, imho, we would only need to perform path exploration after an
outage
You can do it more cleverly than that.
can you expand on that?
I fail to understand what you are missing. Failure detection and Path
exploration are key components of the shim, and they are needed to
preserve established communications through outages.
Sure, I agree. But why not do it in the simplest way possible?
and that would be...
on that. It's the n^2 probing I want to ensure is /not/ considered
for inclusion in shim6 RFCs, other than as something mentioned as a
possible implementation detail.
Ok, i think i see now. Your problem is with probing with different
source locators, right?
Yup.
Well, this is needed because the source address determines the exit
path from the multihomed site. I mean, because we are assuming PA
addressing, changing the source address results in using a different
ISP in the multihomed site. That is why different source address need
to be explored
Explored and then dismissed as a bad idea, I strongly hope.
The source to use for the shim should be determined by the
/destination/ address in the shim6 packet you receive (eg) first from
the other side.
Ie, given two shim6 stacks, A and B, that want to talk to each other
via the global internet (eg to exchange info needed to create shim
mapping between them, A initiates), each with 3 addresses say (A1, A2,
so on). The communication required is:
A retrieves the required locator information, and gets a list "B1, B2,
B3". It then sends a packet to each remote address, n packets at a
time, in series, with whatever shim6 control message is required:
A(IN6_ADDRANY) -> B1
A(IN6_ADDRANY) -> B2
A(IN6_ADDRANY) -> B3
some time later a reply is received:
A(x) <- B3
The source address A should prefer to use (if it must prefer one over
IN6_ADDRANY) is 'x', as determined by which destination address was in
the packet of B's that got through.
See, that's simple and works - no probing required.
first of all, this is probing since you sent 3 packets. Second, which
source addresses has A used for sending these packets?
If "n^2 probing is simply not an option" is in your mind, then you'll
start realising there are other, simpler, better ways of achieving the
same end-goal - which don't require messing with local SAS policy
either, but rather use it as intended (without modification).
There's many many years of existing deployment of IP using systems
and applications that simply don't consider such complex probing
worth it.
right, because they are not assuming the usage of multiple PA
addresses in a single host
That's a fair point, and likely part of the answer.
Another possibility is simply that path failures in the 'middle' of
internet are rare and hence users and apps have not had any compelling
need to explore this possibility.
As i pointed out in the example above, different source/destiantion
address combinations are required to deal with failures in the edges
(i.e. ISP - multihomed site links)
Regards, marcelo