[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Requirements [was Re: Transport level multihoming]



Sorry to reply twice to the same message, but I have more to say :)

On Wed, Apr 04, 2001 at 12:27:40PM -0400, Daniel Senie wrote:
> > >                  We need (a).  Do we need (b)?  Or is acceptable
> > >                  to require host software updates to obtain
> > >                  reliable connections in this situation?
> > >
> > My answer is no, it would be a very serious mistake to require (b)
> > in this case. It would result in an undeployable solution.
> 
> Agree with Brian. Applications will have to deal with restarting
> connections if they die. This is fairly common already, since the
> IPv4/BGP multihoming world works this way.

Consider two end-points of some connection-oriented transport-layer
session, separated by (say) eleven ASes. There is substantial multi-
homing in most ASes in the path.

The chance of a re-homing event happening within the life of a
session is the sum of the probabilities of a re-homing event occuring
in any of the ASes in the path.

Hence in the scenario outlined (for example, a customer of a tier-5
network operator in Niue exchanging data with a customer of a tier-5
network operator in Russia) there is substantially more risk of a re-homing
event somewhere in the path than there is for sessions which aren't
separated by such a long AS distance.

Eleven is a long AS distance. The average diameter of the internet is
much smaller. Do we want an architecture that only works for users
near the core? Or is the internet for everyone?

There are lots of long-held TCP sessions happening in the network
today. I can think of lots of applications that currently make use
of long-held TCP sessions, none of them are particularly exotic, and
most of them don't run between applications that restart particularly
elegantly.

By and large, none of them currently break when a re-homing event occurs
somewhere between the session endpoints (exceptions include where
packets continue to be dropped for a period that exceeds a TCP
timeout, or severe instability causes a transient host unreachable
message from an intermediate router).

Re-homing events due to circuit breakage are probably nicely randomly
distributed and infrequent events, and their impact even in a long
AS path might be minimal, in real terms. However, re-homing events
due to routing instability and congestion is much more frequent in
my experience and, worse, re-homing events with these kinds of causes
tend to cluster within quite short time periods.

If these are to be the new semantics of transport across the internet,
many applications will require changes to support them. If application
change is inevitable, how does that affect the solutions we consider
to the basic problem?

To put it another way: our ssh and ftp sessions tend to stay up as long
as we need them today. It is dangerous to assume that this is because
re-homing events are rare; rather, it's entirely possible that our
sessions are re-routed many times during their lifetime, and the fact
that our sessions survive most of the time is a direct result of
transport session stability available in current v4 CIDR multi-homing.

We should think hard before deciding that this is not a requirement.


Joe