Excuse front posting but these comments are quite general.
What Scott is describing is essentially full offload of the
shim and the bottom part of the stack. We'd actually end
up with two IP stacks - one in the host which just sends
packets to the offload device, and a second one in the
offload device which has a shim on top of it. I'd want to
see a complete architecture for that including a demonstration
that the security architecture of shim6 isn't damaged,
and analysis of the trust model and threat model
between the host and the offload device. But if it can be
done, it has a very nice property - it actually offers
a practical way to implement something that works much like
8+8. The offload device will be state-heavy though; it will
need to carry state per session for every host it's supporting.