[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TE & SHIM6 (was Re: comments on draft-ietf-shim6-proto-03



[This message will go into both source address rewriting and traffic engineering.]
On 27-feb-2006, at 20:48, Erik Nordmark wrote:

There are two problems with allowing routers to rewrite source addresses: 1. The routers must know which packets are "legacy" and can't have their source address changed vs which packets are controlled by shim6 or another mechanism that can handle rewritten source addresses. 2. In current shim6, only previously negotiated source addresses may be used, which means the shim6-enabled hosts in a site and the rewriting routers must coordinate their efforts so correspondent hosts don't see unexpected source addresses.
FWIW draft-nordmark-shim6-esd-00.txt is on the way to the I-D directory, and it has some ideas for how to address this.
Right. Lots of good points in there, but unfortunately, I disagree  
with the mechanisms proposed. I really hope I'll never have to run  
DHCPv6 to configure my hosts, it's a big, fat, unelegant protocol.
The first issue is readily solvable by simply having shim6 hosts put a magic value in the upper 64 bits of the source address that indicates "rewriting permitted".
Or next hdr = IPPROTO_SHIM6.
I don't find this very suitable. If we're going to send many shimmed  
packets, it's more important than ever that we omit the shim header  
whenever possible. Apart from that, using the source address to  
signal that the source address may be changed is much cleaner. It  
also has the advantage that we can now borrow some bits  to make the  
process easier. What we can do is have shim6 capable hosts emit a  
"source rewriting information request". That would be a packet  
addressed to a shim6 correspondent that has the magic prefix in the  
source address that triggers source address rewriting, and an  
additional bit combination that tells the router to send back a list  
of prefixes it will use to rewrite. The host can then make sure that  
the correspondent knows to expect packets with these source addresses.
If this is an ordered list, the host can then use bits in the data  
packets with the rewrite prefix in the source address to tell the  
router which addresses it may insert. (Not sure what would happen  
though if the router wants to rewrite into Y but the host only allows  
X and Z.)
I've been thinking about something similar for traffic engineering  
ever since my message yesterday where I mentioned A6 records. The  
problem is that it's far from inconceivable that at some point, a  
disconnect forms between the info in the DNS and the actual state of  
the network. The way I see it, we have four ways to convey TE related  
info:
1. out of band end-to-end: this would be stuff in the DNS
2. out of band hop-by-hop: BGP is like this
3. in-band end-to-end: measured timing and packet loss information
4. in-band hop-by-hop: feedback from routers

The problem is that 2. needs aggregation to scale. 3. and 4. need to have contact with the correspondent already, so it's useless in some cases, like in the case where we want one or more backup addresses that are only tried if the primary addresses don't work. The only way to convey this is with 1. We can either reuse SRV records for individual services for this, which has the advantage that it's already available today, but the disadvantage that this mechanism isn't really used and it needs to be supported on an application-by- application basis. Alternatively, we can do some magic in the resolver library to make this happen.
But this doesn't really make it possible to react to traffic  
engineering events in anything close to real time, if at all (DNS may  
not be accessible by people who need to do TE.) The thing is, BGP  
isn't all that great for this either: with current multihoming, you  
can't engineer traffic such that link 1 gets the first 10 megabits,  
then everything between 10 and 15 goes to link 2 and if there's more  
than 15 Mbit it's balanced over the two links in a 2:1 ratio.  
(Believe me, this doesn't stop people from asking.)
But an in-band hop-by-hop TE mechanism would allow exactly this. The  
way it would work is that routers are configured to provide feedback  
for packets with a shim header, if necessary. This feedback would be  
in the form of entries that go into the address selection policy  
table. The site egress router would probably want to inform hosts  
about which source addresses go well with certain destination  
prefixes. All routers between the source and destination (including  
the site exit router) would signal back "this prefix (which would be  
the prefix that the destination for the packets falls into)  
preference value XXX". To avoid trouble, the preference value  
shouldn't be allowed to completely override locally configured info.
Ignoring non-shim6 traffic for a moment, this would allow any router  
in the path to push back traffic when the conditions warrant it. A  
router could be configured to start lowering the preference value  
when traffic hits a certain threshold and shim6 traffic would  
automatically be rerouted if possible. Obviously there's still the  
potential for conflicting preferences.
A less fortunate side effect could be that a lot of regular traffic  
would be shimmed when the initially chosed destination address isn't  
optimal, which is only discovered when shim state is created after  
the session reaches a certain number of packets exchanged. So this  
would work even better if shim packets are exchanged before the  
session starts, like you describe in your draft. Interestingly,  
suppressing the shim header makes shimming less problematic but with  
the shim header suppressed the traffic engineering doesn't work  
halfway through a long-lived exchange. This can be fixed by  
periodically sending a packet with a shim header, though. Shimming  
for TE reasons could also be problematic when one side garbage  
collects shim state too aggressively.
If we go down this road it may be useful to have one or more bits in  
the shim context tag to communicate with routers, so we'd probably  
want to make the context tag a bit smaller than 47 bits.
Last but not least: it's probably useful to use SRV records (if we're  
going to use those anyway) to tell hosts that:
- they shouldn't initiate shim6 (because the other end wants to  
control when this happens or shim6 isn't supported)
- they should defer shim6 negotiation as per local policy
- they should do shim6 negotiation before starting any sessions

The latter would probably be desireable for sites that want to optimize for TE or want to balance incoming sessions over different hosts.