[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Preserving established communications (was RE: about draft-nordmark-multi6-noid-00)



> > The routing system seems to be good enough for this purpose when sites are
> > not multihomed.
> 
> Well, i am not sure about this.
> Current studies of bgp convergence time are talking about up to 15 minutes
> of BGP convergence when a route is withdrawn. Established communications
> clearly don't survive this.

OK, but fixing the BGP convergence time isn't in scope for this WG.

My logic is to try to find where multihoming makes things different.
The fact that BGP convergence time in the currrent Internet is worse than
desired is something that should be addressed independently of multihoming,
right?

If not, shouldn't multihoming fix other pressing problems like world hunger?

> Now, this is a requirement to the multi-homing solution. Moreover, it is a
> very difficult requirement. I mean, the goal of all this loc/id separation
> is the preservation of established communications, since most of the
> remaining requirements don't need this and could be addressed with
> alternative (more conventional) mechanisms.

I would state the goal differently: taking advantage of the multiple 
attachments to the Internet to make the availability of the communication
better than with a single Internet attachment.

Whether the result would be good enough to
 - make the ftp connection stay alive
 - make the user wait at the web browser or abandon and try another site 
  (order 10 seconds according to some studies I think)
 - make VoIP survive without a click in the phone
is a different aspect of the goal which we haven't talked much about.

> >  - the ISPs peerings with other ISPs fail and the ISP has no alternate
> path;
> >    maybe you assume that the ISP knows how to handle this.
> 
> How would the isp know how to handle this situation? i mean, the ISP has no
> alternate path, so the only path is through the alternative isp of the
> multi-homed site.
> This is the relevenat case, imho, since is the case not covered by rfc3178.
> So i would say that anyone implementing a solution other than rfc3178 would
> want to be tolerant to this type of failure.

Actually, in my mind but not in my writing, there are two subcases.
1) the ISP looses one peering and there is no alternate path to reach certain
   destinations (due to the policy for the other peerings the ISP has)
   If there are alternatives then the ISP can still reach all destinations
   hence there is no observable failure.
2) the ISP looses peering(s) so that it can no longer reach any destination

The second case can be handled without passing all routes to the site.
How common is the first case?
It depends on how the ISPs handle their own redundancy.

I don't do operations thus I'd be interested in folks with operational
experience commenting on the common and likely failures that a site should
worry about.
What I've heard of are failures due to links being cut between sites and their
ISP, backbone links back-hoed (but don't understand what actual impact
they had on each ISPs network), and ISPs going bankrupt.

> Well, an option would be to have a cache of host that the node is
> communicating with, so that it can ping them periodically until the cache
> expires. The cache lifetime is extended everytime a packet from/to that
> destiantion flows.

Yes, but what is the lifetime of the cache entry after the last packet?
How can you pick a value without knowing the actual behavior of the (UDP)
application?
Finally, any simulation data on how many peridical pings would be
added to the Internet when everybody implements such a scheme?

> Well, som additional cosndierations about this.
> First, many existing ULP already exchange this type of messages, TCP ack for
> instance, so hellos can be piggybacked into this messages.

The TCP ssh connection I've had open for the last 4 days have not exchanged
any packets as far as I know. (Well, perhaps there are the keepalive packets
every 2 hours or so, but they wouldn't be sufficient to ensure that
once I type something there is indeed a rapid response.)

And we are not considering a TCP-only solution, are we?


My take is that we should make the multihoming solution improve the
availability of sites with multiple Internet attachments without requiring or
assuming e2e periodic pings to quickly detect failures.
Some applications/upper layer protocols might want to use such mechanisms
in addition to the rehoming support in the multihoming solution for quicker
failures (For instance, SCTP already has such a mechanism; heartbeats.)
Thus I'm advocating not assuming that every ULP connection/session/assocatin
has hearbeats when solving rehoming since we know of neither the performance
implications of this on a large scale, nor the benefits that applications
in general will derive from it.

  Erik