[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: TE & SHIM6 (was Re: comments on draft-ietf-shim6-proto-03
[This message will go into both source address rewriting and traffic
engineering.]
On 27-feb-2006, at 20:48, Erik Nordmark wrote:
There are two problems with allowing routers to rewrite source
addresses:
1. The routers must know which packets are "legacy" and can't have
their source address changed vs which packets are controlled by
shim6 or another mechanism that can handle rewritten source
addresses.
2. In current shim6, only previously negotiated source addresses
may be used, which means the shim6-enabled hosts in a site and the
rewriting routers must coordinate their efforts so correspondent
hosts don't see unexpected source addresses.
FWIW draft-nordmark-shim6-esd-00.txt is on the way to the I-D
directory, and it has some ideas for how to address this.
Right. Lots of good points in there, but unfortunately, I disagree
with the mechanisms proposed. I really hope I'll never have to run
DHCPv6 to configure my hosts, it's a big, fat, unelegant protocol.
The first issue is readily solvable by simply having shim6 hosts
put a magic value in the upper 64 bits of the source address that
indicates "rewriting permitted".
Or next hdr = IPPROTO_SHIM6.
I don't find this very suitable. If we're going to send many shimmed
packets, it's more important than ever that we omit the shim header
whenever possible. Apart from that, using the source address to
signal that the source address may be changed is much cleaner. It
also has the advantage that we can now borrow some bits to make the
process easier. What we can do is have shim6 capable hosts emit a
"source rewriting information request". That would be a packet
addressed to a shim6 correspondent that has the magic prefix in the
source address that triggers source address rewriting, and an
additional bit combination that tells the router to send back a list
of prefixes it will use to rewrite. The host can then make sure that
the correspondent knows to expect packets with these source addresses.
If this is an ordered list, the host can then use bits in the data
packets with the rewrite prefix in the source address to tell the
router which addresses it may insert. (Not sure what would happen
though if the router wants to rewrite into Y but the host only allows
X and Z.)
I've been thinking about something similar for traffic engineering
ever since my message yesterday where I mentioned A6 records. The
problem is that it's far from inconceivable that at some point, a
disconnect forms between the info in the DNS and the actual state of
the network. The way I see it, we have four ways to convey TE related
info:
1. out of band end-to-end: this would be stuff in the DNS
2. out of band hop-by-hop: BGP is like this
3. in-band end-to-end: measured timing and packet loss information
4. in-band hop-by-hop: feedback from routers
The problem is that 2. needs aggregation to scale. 3. and 4. need to
have contact with the correspondent already, so it's useless in some
cases, like in the case where we want one or more backup addresses
that are only tried if the primary addresses don't work. The only way
to convey this is with 1. We can either reuse SRV records for
individual services for this, which has the advantage that it's
already available today, but the disadvantage that this mechanism
isn't really used and it needs to be supported on an application-by-
application basis. Alternatively, we can do some magic in the
resolver library to make this happen.
But this doesn't really make it possible to react to traffic
engineering events in anything close to real time, if at all (DNS may
not be accessible by people who need to do TE.) The thing is, BGP
isn't all that great for this either: with current multihoming, you
can't engineer traffic such that link 1 gets the first 10 megabits,
then everything between 10 and 15 goes to link 2 and if there's more
than 15 Mbit it's balanced over the two links in a 2:1 ratio.
(Believe me, this doesn't stop people from asking.)
But an in-band hop-by-hop TE mechanism would allow exactly this. The
way it would work is that routers are configured to provide feedback
for packets with a shim header, if necessary. This feedback would be
in the form of entries that go into the address selection policy
table. The site egress router would probably want to inform hosts
about which source addresses go well with certain destination
prefixes. All routers between the source and destination (including
the site exit router) would signal back "this prefix (which would be
the prefix that the destination for the packets falls into)
preference value XXX". To avoid trouble, the preference value
shouldn't be allowed to completely override locally configured info.
Ignoring non-shim6 traffic for a moment, this would allow any router
in the path to push back traffic when the conditions warrant it. A
router could be configured to start lowering the preference value
when traffic hits a certain threshold and shim6 traffic would
automatically be rerouted if possible. Obviously there's still the
potential for conflicting preferences.
A less fortunate side effect could be that a lot of regular traffic
would be shimmed when the initially chosed destination address isn't
optimal, which is only discovered when shim state is created after
the session reaches a certain number of packets exchanged. So this
would work even better if shim packets are exchanged before the
session starts, like you describe in your draft. Interestingly,
suppressing the shim header makes shimming less problematic but with
the shim header suppressed the traffic engineering doesn't work
halfway through a long-lived exchange. This can be fixed by
periodically sending a packet with a shim header, though. Shimming
for TE reasons could also be problematic when one side garbage
collects shim state too aggressively.
If we go down this road it may be useful to have one or more bits in
the shim context tag to communicate with routers, so we'd probably
want to make the context tag a bit smaller than 47 bits.
Last but not least: it's probably useful to use SRV records (if we're
going to use those anyway) to tell hosts that:
- they shouldn't initiate shim6 (because the other end wants to
control when this happens or shim6 isn't supported)
- they should defer shim6 negotiation as per local policy
- they should do shim6 negotiation before starting any sessions
The latter would probably be desireable for sites that want to
optimize for TE or want to balance incoming sessions over different
hosts.