[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TE & SHIM6 (was Re: comments on draft-ietf-shim6-proto-03



[This message will go into both source address rewriting and traffic engineering.]

On 27-feb-2006, at 20:48, Erik Nordmark wrote:

There are two problems with allowing routers to rewrite source addresses: 1. The routers must know which packets are "legacy" and can't have their source address changed vs which packets are controlled by shim6 or another mechanism that can handle rewritten source addresses. 2. In current shim6, only previously negotiated source addresses may be used, which means the shim6-enabled hosts in a site and the rewriting routers must coordinate their efforts so correspondent hosts don't see unexpected source addresses.

FWIW draft-nordmark-shim6-esd-00.txt is on the way to the I-D directory, and it has some ideas for how to address this.

Right. Lots of good points in there, but unfortunately, I disagree with the mechanisms proposed. I really hope I'll never have to run DHCPv6 to configure my hosts, it's a big, fat, unelegant protocol.

The first issue is readily solvable by simply having shim6 hosts put a magic value in the upper 64 bits of the source address that indicates "rewriting permitted".

Or next hdr = IPPROTO_SHIM6.

I don't find this very suitable. If we're going to send many shimmed packets, it's more important than ever that we omit the shim header whenever possible. Apart from that, using the source address to signal that the source address may be changed is much cleaner. It also has the advantage that we can now borrow some bits to make the process easier. What we can do is have shim6 capable hosts emit a "source rewriting information request". That would be a packet addressed to a shim6 correspondent that has the magic prefix in the source address that triggers source address rewriting, and an additional bit combination that tells the router to send back a list of prefixes it will use to rewrite. The host can then make sure that the correspondent knows to expect packets with these source addresses.

If this is an ordered list, the host can then use bits in the data packets with the rewrite prefix in the source address to tell the router which addresses it may insert. (Not sure what would happen though if the router wants to rewrite into Y but the host only allows X and Z.)

I've been thinking about something similar for traffic engineering ever since my message yesterday where I mentioned A6 records. The problem is that it's far from inconceivable that at some point, a disconnect forms between the info in the DNS and the actual state of the network. The way I see it, we have four ways to convey TE related info:

1. out of band end-to-end: this would be stuff in the DNS
2. out of band hop-by-hop: BGP is like this
3. in-band end-to-end: measured timing and packet loss information
4. in-band hop-by-hop: feedback from routers

The problem is that 2. needs aggregation to scale. 3. and 4. need to have contact with the correspondent already, so it's useless in some cases, like in the case where we want one or more backup addresses that are only tried if the primary addresses don't work. The only way to convey this is with 1. We can either reuse SRV records for individual services for this, which has the advantage that it's already available today, but the disadvantage that this mechanism isn't really used and it needs to be supported on an application-by- application basis. Alternatively, we can do some magic in the resolver library to make this happen.

But this doesn't really make it possible to react to traffic engineering events in anything close to real time, if at all (DNS may not be accessible by people who need to do TE.) The thing is, BGP isn't all that great for this either: with current multihoming, you can't engineer traffic such that link 1 gets the first 10 megabits, then everything between 10 and 15 goes to link 2 and if there's more than 15 Mbit it's balanced over the two links in a 2:1 ratio. (Believe me, this doesn't stop people from asking.)

But an in-band hop-by-hop TE mechanism would allow exactly this. The way it would work is that routers are configured to provide feedback for packets with a shim header, if necessary. This feedback would be in the form of entries that go into the address selection policy table. The site egress router would probably want to inform hosts about which source addresses go well with certain destination prefixes. All routers between the source and destination (including the site exit router) would signal back "this prefix (which would be the prefix that the destination for the packets falls into) preference value XXX". To avoid trouble, the preference value shouldn't be allowed to completely override locally configured info.

Ignoring non-shim6 traffic for a moment, this would allow any router in the path to push back traffic when the conditions warrant it. A router could be configured to start lowering the preference value when traffic hits a certain threshold and shim6 traffic would automatically be rerouted if possible. Obviously there's still the potential for conflicting preferences.

A less fortunate side effect could be that a lot of regular traffic would be shimmed when the initially chosed destination address isn't optimal, which is only discovered when shim state is created after the session reaches a certain number of packets exchanged. So this would work even better if shim packets are exchanged before the session starts, like you describe in your draft. Interestingly, suppressing the shim header makes shimming less problematic but with the shim header suppressed the traffic engineering doesn't work halfway through a long-lived exchange. This can be fixed by periodically sending a packet with a shim header, though. Shimming for TE reasons could also be problematic when one side garbage collects shim state too aggressively.

If we go down this road it may be useful to have one or more bits in the shim context tag to communicate with routers, so we'd probably want to make the context tag a bit smaller than 47 bits.

Last but not least: it's probably useful to use SRV records (if we're going to use those anyway) to tell hosts that:

- they shouldn't initiate shim6 (because the other end wants to control when this happens or shim6 isn't supported)
- they should defer shim6 negotiation as per local policy
- they should do shim6 negotiation before starting any sessions

The latter would probably be desireable for sites that want to optimize for TE or want to balance incoming sessions over different hosts.