[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Preserving established communications (was RE: about draft-nordmark-multi6-noid-00)



Erik, Iljistch,


About the transport survivability:

> Perhaps the transport survivability statement in the goals
> document is a bitoff the mark.
>

Well, i don't know. It is the closest to a consensus that we could get in
those issues.
We could reconsider it, i guess. But imho we should agree on what we are
looking for.

But what i really think is that we should be aware of the capabilities and
limitations of the solutions. We should not believe that we are making a big
effort to provide established communications survivability when what we
actually are providing is a limited solution that will only preserve some
communications but not the general case.

IMO, we should try to satisfy this requirement, or at least provide some
tools to allow to achieve this functionality, even if additional mechanisms
are required to do it.

About the broken routing system:

> Building a solution which assumes that routing basically works
> i.e. only handling the fact that a multihomed site will have multiple
> locators and need to have able to "rehome" communication between
> those locators seems to make sense to me.
> Assuming that routing is broken

Actually i don't think that the routing is broken. IMHO is a bit more
complex than that.
I think that the constraints imposed to a large routing system are
incompatible with the requirements of preserving established communications.
I mean, a global routing system must be stable. this means that changes on
the topology of short duration cannot be propagated to all the network. The
system has to filter the changes with high frequency in order to obtain
stability. OTOH, applications are susceptible to this short cuts,
essentially this means that what is not significant for the routing system
to propagate it is significant for the application.
This doesn't mean that the routing system is broken, it just means that the
transistorizes of the routing system to reach a stable view are filtered and
that during that transistorizes the view of the topology by the routing
system will not be accurate during that period.
this means that the routing system is useful to provide the view of the
network in stationary state and that high frequency changes will not be
reflected
IMHO this is ok
The problem is that you cannot ask the routing system to react fast enough
when changes occur (actually you don't want that)

So, in a few words, the routing system has some inherent constraints that
limit its response time.
The resulting response time may be ok for some apps but it may not be ok for
other apps.
So, one approach is to accept that the routing system response is limited,
and let the apps that require additional functionality to build their own
stuff.

> and making multi6 a general purpose overlay

you can see it that way, you can also consider it as the host selecting the
path since he is the only one who can do it in a proper way.

> which works over broken routing doesn't seem like a useful approach -
> if routing doesn't work how can you reach the root DNS servers?

Well, DNS already have their own fault tolerance support, don't they?
I mean if a root server is not available because bgp is reconverging for 15
minutes, i don't wait 15 minutes, i just pick the next one on the list.


About keepalives:

> But perhaps we started off the wrong foot here.
>

FWIW i don't like keepalives more than you do.
It is just that i don't think that the routing system can provide the
capabilities imposed by the preserving established communications
requirement, and i can't think any better solution

But let's explore alternatives, we can always come back to this if we don't
find anything better.

Moving forward:

The routing system has a limited response time that may not be suitable for
some apps. agree?

We have then two approaches to improve this (that may be complementary)

- Obtain hints from the routing system that something is wrong

As you proposed:

> But if something happens in ISP A that will only affect traffic using
> the locator containing the A prefix. Thus the site doesn't need
> BGP to converge
> about routing for A's prefix (which is the convergence time
> issue); the site
> would just like to receive some indication from A (or detect that
> the link to
> A is down) that something is not working well for A.  That could
> be sufficient
> to have the site border routers start using the path through ISP B.
>

I like that idea, especially the churn rate idea, i think we should study it
more in depth.

The benefits resulting from this approach is that the response time obtained
using information from the routing system is improved.

I guess that some apps will need better response time anyway, so the
complementary approach is using some ULP hints (as Iljistch been proposing
for several years by now)

This means that if the ULP protocol detects something wrong it would notify
to the shim layer and the shim layer would act accordingly changing
locators.

Some potentials problems that i can think of:

My first concern is about the interaction of this two mechanisms.
I mean, in the routing system based mechanism the knowledge of which locator
is better resides in the routing system, So, since the routing system knows
which locator is better, routers are enabled to rewrite locators, since they
can choose the best one.
In the other mechanism, the one who knows which is the bests locator is the
host itself, so it forces the selection of the ISP to be used by selecting
the appropriate locators (both source and destination)

Now my concern is how these two mechanism interact, i mean the host selects
one locator because it has an upper layer hint that something is going wrong
and then the routing system that still haven't detected anything yet, just
rewrites the locator.

Let's suppose that we have two hosts A and B and both of them reside in
multi-homed hosts, so A has two addresses P1:A and P2:A and B also has two
addresses P3:B and P4:B

Now they have a TCP connection established using P1:A and P3:B.
Suddenly, A detects that it has to retransmit and sends this hint to his own
shim layer.
Now what does the shim layer in A does?
It can change the Source locator and/or the destination locator.
Changing the source locator is not very useful since it will be rewritten by
the border router, and perhaps changing only the destination locator doesn't
solve the problem
Perhaps if the rewrite ok bit is not set, this would mean that the packets
has to be routed through the isp that is compatible with the source address
contained in the packet, so that the host can force the isp selection.

I know this is just speculation, and that we should see the details with a
more concrete proposal.

My second concern is about ULP hints implementation.
Right now ULP protocols don't do this. so this would imply modification, not
only in all the hosts to support the new shim layer, but also in all the ULP
that want to obtain the improved service.

Regards, marcelo