[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Preserving established communications (was RE: about draft-nordmark-multi6-noid-00)



> > Perhaps the transport survivability statement in the goals
> > document is a bitoff the mark.
> >
> 
> Well, i don't know. It is the closest to a consensus that we could get in
> those issues.

But if I don't misremember the discussion that lead to the transport
survivability there was a question whether it was considered sufficient
to select a working set of addresses/locators when new communication
was established (e.g., as part of the application + TCP trying all
the addresses returned from the DNS) and the concensus was
that this was insufficient. Thus the goal is to provide rehoming
of already established communication.

This is expressed in the document as "transport survivability" but I don't
think we discussed whether this also implied a failover in any particular
time. As we've discussed different transports have radically different
time constraints.

> We could reconsider it, i guess. But imho we should agree on what we are
> looking for.

Agreed.

> But what i really think is that we should be aware of the capabilities and
> limitations of the solutions. We should not believe that we are making a big
> effort to provide established communications survivability when what we
> actually are providing is a limited solution that will only preserve some
> communications but not the general case.

But there is no useful general case; I can concoct an application + transport 
over UDP that requires failover in one microsecond. I don't think we can use
the existence of such a thing as a guage to say that multihoming should
try to accomplish failover in one microsecond.

And how useful it is to only assume that the time before TCP gives up
(a few minutes) matters when users in front of web browsers give up
a lot sooner (10 seconds?)?

In terms of the time to failover I think all we know is that the shorter
it can be the better, but we don't know how much we are willing to "pay" to
reach any particular time limit (for different types of failures which
occur with different probabilities).

> IMO, we should try to satisfy this requirement, or at least provide some
> tools to allow to achieve this functionality, even if additional mechanisms
> are required to do it.

But I don't even know what it means to satisfy "transport survivability" - for
my microsecond transport? For common TCP implementations timeout (since the
TCP standard doesn't say when TCP should given up; the assumption by some
protocol designers was "never" - keep trying).

> I think that the constraints imposed to a large routing system are
> incompatible with the requirements of preserving established communications.
> I mean, a global routing system must be stable. this means that changes on
> the topology of short duration cannot be propagated to all the network. The
> system has to filter the changes with high frequency in order to obtain
> stability. OTOH, applications are susceptible to this short cuts,
> essentially this means that what is not significant for the routing system
> to propagate it is significant for the application.

I don't think it is that black and white. The routing system could propagate 
bad news faster than good news; what needs to be limited is the amount
of flapping.
If bad news travel fast then multihoming can provide choices between
different paths taking the bad news into account.

> This doesn't mean that the routing system is broken, it just means that the
> transistorizes of the routing system to reach a stable view are filtered and
> that during that transistorizes the view of the topology by the routing
> system will not be accurate during that period.

What do you mean by "transistorizes"? Dictionaries say:
Date: circa 1952
: to equip (a device) with transistors

but that doesn't make sense in this context.

> FWIW i don't like keepalives more than you do.
> It is just that i don't think that the routing system can provide the
> capabilities imposed by the preserving established communications
> requirement, and i can't think any better solution

And I think you're just too quick to draw that conclusion which is why
I try to push back a bit.

> Moving forward:
> 
> The routing system has a limited response time that may not be suitable for
> some apps. agree?

Let me suggest that the routing system of today has evolved in an environment
where the endpoints are not agile (in switching between locators) hence there
hasn't been any incentive to extract and/or propagate information
that agile endpoints/sites can use.

> - Obtain hints from the routing system that something is wrong

The need might actually be weaker than that; being able to make some
qantitative comparison between routing (flaps, stability, etc) between
two possible ISPs/paths might be sufficient. Doesn't mean that one
of them have to make a statement that "something is wrong".

Another possibility to investigate is to make bad news travel faster than
good news through the routing system.

> My first concern is about the interaction of this two mechanisms.
> I mean, in the routing system based mechanism the knowledge of which locator
> is better resides in the routing system, So, since the routing system knows
> which locator is better, routers are enabled to rewrite locators, since they
> can choose the best one.
> In the other mechanism, the one who knows which is the bests locator is the
> host itself, so it forces the selection of the ISP to be used by selecting
> the appropriate locators (both source and destination)

Conceptually it would make sense to view it as the host/ULP observing the need
to try a different locator, but the routing system (through feedback mechanisms
like locator rewriting and perhaps others) influence the order in which
locators are tried.

> Now my concern is how these two mechanism interact, i mean the host selects
> one locator because it has an upper layer hint that something is going wrong
> and then the routing system that still haven't detected anything yet, just
> rewrites the locator.

Host would try a different destination locator and routing system would
rewrite the source locator hence they would be complimentary I think.

> My second concern is about ULP hints implementation.
> Right now ULP protocols don't do this. so this would imply modification, not
> only in all the hosts to support the new shim layer, but also in all the ULP
> that want to obtain the improved service.

FWIW RFC 2461 recommends ULP hints "all is good" to avoid performing
neighbor unreachability detection probes. Perhaps the same hints can be
used to have the multi6 layer stay with the same locator?
Or perhaps we need both those positive hints and negative hints for multi6?

  Erik