[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: about draft-nordmark-multi6-noid-00



On 26 okt 2003, at 12:12, marcelo bagnulo wrote:

Suppose that mh1 initiates a communication with mh2, so mh1 selects PA:mh1
as initial source address i.e. initial locator and AID1, and also selects
PC:mh2 as destination address and AID2.
The initial packet must be sent with the "rewrite ok" bit not set.
Suppose now that the internal routing of the site mh1 is such that packets
sent to PC:: are forwarded through ISPB. This means that the packet
generated by mh1 described above will be discarded by ingress filtering in
ISPB because its source address is incompatible and cannot not be rewritten.

There are solutions possible that allow rewriting all packets. However, IPv6 as we know it today does not. So we need to solve this problem anyway. I think source address dependent routing makes the most sense here.


the internal routing system
knows that it has to forward the packet through ISPA, since there is no
route available through ISPB. However, the actual packet has PB:mh1 as
source address, making the packet incompatible with ingress filtering. The
packet then will be dropped.

When an ISP fails, it is of course extremely important to have traffic rerouted over the other one. Today, we do this using BGP. We can still do that, but it has the disadvantage that packets for every destination only flow over one ISP. If that ISP is incapable of delivering the packets, there is trouble (although this should be rare if we do BGP rather than just a default). Less important, but still annoying: we have to depend on BGP's limited best path selection capabilities.


If we do source address dependent routing on the other hand, a host can utilize both ISP links for the same destination at the same time. Since the destination address in the packet is then no longer relevant, we don't need to run BGP either.

Of course now the multihoming system in the host must detect unreachability and initiate a rehoming event when there is a problem in the path over one ISP.

The internal routing system reconverges and packets are no
longer routed through ISPA but they are routed through ISPB. Since the
"rewrite ok" would be set at this moment of the communication, then packets
that are originated by mh1 with PA:mh1 as source address are rewritten by
the exit router so that PB:mh1 is included as source address. Now mh2 learns that it has to use PB:mh1 as destination address because this is the address contained in the packets that it is receiving. Is that ok?

The source address used by the other side is a good hint when there is no real preference as to which destination address for the other side is preferred. However, each side must be able to jump destination addresses as it is possible that failures occur somewhere along the way where neither side knows that something is wrong so there is no change of source address.


Also, there is the case of asymmetric reachability, where there are two links that only work in one direction so packets from A to B must use different addresses than packets from B to A. This can happen with radio links, or a weaker case where there is still symmetric connectivity but the asymmetric connectivity is of a higher bandwidth, is common with one way satellite links.

Now my question is how long does this mechanism takes to react? i.e. the
response time. In order to preserve established communications, the
interruption shouldn't be too long.

We probably want to monitor transport layer events to see whether rehoming is needed. This could be pretty fast in most cases.


The problem here is that intradomain routing is pretty slow detecting outages.

Yes, the worst case is pretty pathetic. Fortunately, the worst case isn't the most common one.


Suppose that ISPA is running BGP
with its upstream provider, then if i am correct, the value recommended in
the BGP spec to detect an outage between peers is 90 seconds.

Correct, but Cisco in their infinite wisdom doubled this to 180 seconds. But you can set this much lower. I used to recommend 15 seconds but some BGP implementations are too braindead to work well with that. Not many problems with 30 seconds, though.


Note that in many cases, the sessions will go down much faster because BGP monitors the interface the session runs over.

This means that the interruption in the communication will be longer than 90 seconds. IMHO this is not good enough to preserve established communications. Am i missing something?

Isn't the TCP timeout 240 seconds? Sessions should survive, not sure if the user is this patient, though.


My point is that the usage of the routing system to determine which locator
to use and to preserve established communication is not a good option
because:

I agree with your points here.


It is then my conclusion that path outage detection and locator selection
has to be performed by the end-host themselves and not by the routing
system.

Indeed.


Iljitsch