[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TE & SHIM6 (was Re: comments on draft-ietf-shim6-proto-03




[Sorry for the delay in responding]

Iljitsch van Beijnum wrote:

Or next hdr = IPPROTO_SHIM6.

I don't find this very suitable. If we're going to send many shimmed packets, it's more important than ever that we omit the shim header whenever possible.

I know packets don't have infinite size, but if we were willing to have
the IP header grow 20 bytes to make it possible to give an IPv6 address
to every cell phone and light switch, then perhaps we can allow
ourselves to have the header grow 8 bytes if this enable the routing
system to scale.

Apart from that, using the source address to signal that the source address may be changed is much cleaner. It also has the advantage that we can now borrow some bits to make the process easier. What we can do is have shim6 capable hosts emit a "source rewriting information request". That would be a packet addressed to a shim6 correspondent that has the magic prefix in the source address that triggers source address rewriting, and an additional bit combination that tells the router to send back a list of prefixes it will use to rewrite. The host can then make sure that the correspondent knows to expect packets with these source addresses.

To me this has some downsides. For one there is the deployment
constraint that if there isn't a rewriting router, then things just
fail. Thus the hosts need to behave different when there is no rewriting
exit router.
I don't know enough about current router implementation considerations
to say whether having some bit pattern trigger sending an information
back to the source is hard or easy, but it seems like uneccessary
complication for the router. The fact that it rewrote the packet
provides the necessary information.

I've been thinking about something similar for traffic engineering ever since my message yesterday where I mentioned A6 records. The problem is that it's far from inconceivable that at some point, a disconnect forms between the info in the DNS and the actual state of the network. The way I see it, we have four ways to convey TE related info:

1. out of band end-to-end: this would be stuff in the DNS
2. out of band hop-by-hop: BGP is like this
3. in-band end-to-end: measured timing and packet loss information
4. in-band hop-by-hop: feedback from routers

This is quite a useful categorization.

The problem is that 2. needs aggregation to scale. 3. and 4. need to have contact with the correspondent already, so it's useless in some cases, like in the case where we want one or more backup addresses that are only tried if the primary addresses don't work. The only way to convey this is with 1. We can either reuse SRV records for individual services for this, which has the advantage that it's already available today, but the disadvantage that this mechanism isn't really used and it needs to be supported on an application-by-application basis. Alternatively, we can do some magic in the resolver library to make this happen.

If one could define SRV to be for the "IP" service, then one might not
have to modify the applications. Thus we'd have SRV records for names like
	_ip.www.example.com
and the port number field wouldn't be used.
With such a hack one can make getaddrinfo() look for such things before
looking for AAAA and A records, without any application impact.

But this doesn't really make it possible to react to traffic engineering events in anything close to real time, if at all (DNS may not be accessible by people who need to do TE.) The thing is, BGP isn't all that great for this either: with current multihoming, you can't engineer traffic such that link 1 gets the first 10 megabits, then everything between 10 and 15 goes to link 2 and if there's more than 15 Mbit it's balanced over the two links in a 2:1 ratio. (Believe me, this doesn't stop people from asking.)

But an in-band hop-by-hop TE mechanism would allow exactly this. The way it would work is that routers are configured to provide feedback for packets with a shim header, if necessary. This feedback would be in the form of entries that go into the address selection policy table. The site egress router would probably want to inform hosts about which source addresses go well with certain destination prefixes. All routers between the source and destination (including the site exit router) would signal back "this prefix (which would be the prefix that the destination for the packets falls into) preference value XXX". To avoid trouble, the preference value shouldn't be allowed to completely override locally configured info.

Isn't locator rewriting an easier and more scalable way to accomplish
the dynamic TE? If need to express what source prefixes to with what
destination prefixes there is potentially a

Last but not least: it's probably useful to use SRV records (if we're going to use those anyway) to tell hosts that:

- they shouldn't initiate shim6 (because the other end wants to control when this happens or shim6 isn't supported)
- they should defer shim6 negotiation as per local policy
- they should do shim6 negotiation before starting any sessions

The latter would probably be desireable for sites that want to optimize for TE or want to balance incoming sessions over different hosts.

I'm having a hard time seeing how this can be done without introducing several additional SRV lookups; being able to express all these in a single SRV lookup would be hard. (Hmm, unless we take the _ip.www.example.com hack above and hack it some more by overloading the returned port number field to have semantics like the above. At that point in time it might be better to introduce a new DNS RR type.)

   Erik