[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

survivability, rewriting



About the session survivability: I think Kurtis is right that we must not let ourselves get caught up in unrealistic expectations. I believe a useful lower limit would be five seconds. This gives just over two round trips when both parties are using GSM/GPRS and/or satellite, a few more for more reasonable link technologies. And two missed acks is the absolute minimum we must require before even considering a rehoming event, as a single missed ack can very easily happen for many reasons that don't warrant rehoming. So if an application can't handle a 5 second gap in the communication, it shouldn't count on general multihoming mechanisms to provide failover.

I don't think this is too unreasonable. Many people have mentioned VoIP, often in the same sentence with extremely unrealistic failover expectations. I believe 5 seconds is workable for VoIP: this is certainly close to the time a user will continue to shout "hello, are you still there?" when using a cell phone with less than perfect reception.

I maintain that having the transport layer provide hints to the multihoming layer about when a rehoming would be desired is the right approach. Yes, this will give us some trouble in the beginning, as existing upper layer protocols don't provide these hints yet. So we implement additional heuristics so the multihoming layer can rehome on its own. But this is never going to be as efficient as having the transport layer do it, as transport protocols have very good knowledge about what's happening end-to-end. TCP for instance goes through great lengths to be able to determine when to retransmit and whether only a single packet was lost or more. Streaming protocols on the other hand have a pretty good idea when new data should be arriving, so here the receiver is in a good position to send nacks or hints.

One thing we haven't discussed so far: if upper layers provide us with a hint, what exactly does the multihoming layer do after receiving such a hint? It would make sense to peform some kind of check to see what kind of reachability exists, but this means we need some kind of ping-like functionality. Is it reasonable to depend on such a mechanism in this age of ICMP paranoia?

About the rewriting: why again are we making life difficult for ourselves? The obvious place to put an indication that the address may be rewritten is... in the address. Is there any reason why we can't have one or more special prefixes that indicate that a router should fill in the source address?

However, this doesn't solve what we should do when rewriting isn't permitted. Obviously we could come up with a multihoming mechanism where rewriting is always allowed, trading off complexity in this area against complexity in recognizing a correspondent and accepting some types of spoofed packets. But "legacy" IPv6 also doesn't permit rewriting, so we must be prepared to handle this. The obvious solution is source address based routing, but it doesn't seem like everyone is convinced.

I don't really see any workable alternatives, though. Even if we can do ICMP or NAROS magic to make sure that new sessions magically use the right source address so initially they're able to pass ingress filtering, it's always possible that halfway through the session is rerouted over another ISP and the source address is filtered. This means we can't reroute traffic based on changes in BGP the way we're used to, the ultimate consequence of which is that we must hardcode all routing decisions. In practice this probably means using one ISP as a primary and the other one only as a backup.

Another problem is that if we depend on BGP to determine which ISP provides the shortest path to a destination, this effectively blocks us from using the other ISP to reach the same destination. For instance, if X has ISPs A and B, and Y has ISPs C and D, then it's entirely possible that X will use ISP A to reach both Y(C) and Y(D), so that when something bad happens with ISP A both of Y's addresses become unreachable. With source address based routing this isn't an issue as X can reach each of Y(C) and Y(D) over both A and B by simply using a different source address.

I do agree that source address based routing (which is in effect a limited form of source routing) doesn't mesh well with our hop by hop forwarding paradigm, but again: I don't see any alternatives.