[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: review of draft-ietf-shim6-failure-detection-03.txt



On 26-jun-2006, at 13:54, Jari Arkko wrote:

One of the reason why I think generality is something that we
need to leave for future verification is that other contexts like
HIP have significantly more complex requirements than Shim6.
In particular, HIP needs to work with IPv4 and NATs.

One way to address this is to split the reachability detection into a shim-specific part (the keepalives) and a generic part (the path exploration), so that the generic part can evolve independent of shim6.

Note that I'm not saying this is necessarily the path we should take, but seeing that there is other stuff that can also benefit from reachability detection, it certainly seems prudent to think about this now while we still have the opportunity to easily go into another direction.

I am uncertain if the ICMP mechanism for the path exploration
part is the best way forward. The entire REAP protocol could
certainly be encapsulated in any "user" protocol.

Yes, but that way it's hard to avoid it from being implemented more than once and it's impossible to keep it from running more than once.

But many
scenarios that I can see do not necessarily work well with ICMP,
particularly with one version of ICMP. In MOBIKE, for instance,
it would have been inappropriate to use anything else than the
regular NAT-T UDP at the bottom, because only that can show
whether actual MOBIKE/IKEv2/IPsec traffic will get through. And
what about HIP running over IPv4?

Since we're determining unidirectional reachability, NAT is not actually as big a problem as it would ordinarily seem. The big issue with NAT is making sure that both ends know the addresses on both sides and there are translation rules that allow incoming packets to be delivered. But you're right ICMP won't work with NAT, we'd have to use UDP for that. But I guess it would make sense to develop a non- NAT IPv6 version first and then see what needs to be done to make it work over IPv4 with NAT.

But as a general rule, I'd like to
get a working, as simple as possible Shim6 mechanism out there.
Even if its not optimized for all situations. We can work on extensions
like fate sharing between contexts later, too.

I disagree with that approach, I think we should make the first version as good as it can be. I don't see much added value in finishing this work a little earlier, we're obviously too late to avoid PI in IPv6 now (if that can be avoided it won't be because of shim6) or avoid having legacy non-shim IPv6 implementations out there, but at the same time IPv6 isn't deployed on any measurable scale yet so there is still (some) time.

Was there other issues related to probe storms? We do have
exponential back-off.

Not really. There is no description of how this works.

Another thing that's missing completely from this draft is a
discussion of how to use address pair preference information. This
makes it impossible to address traffic engineering needs.

This is important, but can be addressed separately.

No, that would make it MUCH harder, as this will only work well if both ends implement it. And we're getting flack for not paying attention to traffic engineering as it is.

This doesn't say what shim6 implementers should do. In my opinion:
keep using deprecated addresses as the ULID/primary locator as long
as possible, but prefer non-deprecated addresses when selecting
alternative locators.

Right. But I actually already deleted Section 4.5 and the discussion
of deprecated addresses. IPv6 specifications already call for use
of non-deprecated addresses for new communications, and disallow
the use of invalid addresses. So its not clear that we need to say more.

Leaving out all mention of deprecated addresses is ok by me, but as soon as you bring it up it's a good idea to say what you want to happen with them.

Data packets as opposed to what other types of packets?

I added a clarification. Basically, its all packets including both
ULP packets and SHIM6 control messages, but NOT keepalives
or probes.

I think it's a good idea to consider TCP ACKs with no user data as non-data packets that don't need to generate return traffic as well.

So when we receive a keepalive from the other side, _we_ stop sending
keepalives? This may be the right thing to do, but it's not obvious
to me why. Some explanation would help.

Keepalives are only used if there's one-way communication. Since the
other side sends a keepalive, its not sending anything else at that
time. Hence we have no need for keepalives.

But do we need this rule? It may make implementations more complex without any benefit.

The keepalives are sent at an interval of 3 seconds (or shorter, I
imagine that an implementation isn't going to keep an exact timer for
each context, any rounding must obviously be in the down direction)
and the timeout is 10 seconds. In these 10 seconds you'd normally
receive 3 keepalives, while 1 is enough to indicate that the other
side is still alive. The other 2 are only there in case of packet
loss. I think that's excessive. Starting the full reachability
exploration because of incidental packet loss isn't such a big deal
that it warrants sending three times as many packets as necessary.

The question is what the right number is. We want to avoid
entering exploration needlessly, so I'd rule 1 keepalive out.
We now have 3, are you arguing for 2? I'd be fine with that,
but I note that we don't have a lot of evidence to support
either view. We're going to have to revisit this after we get
the experience.

Why not leave it up to the implementers? If we say that after 10 seconds the full path exploration starts, implementers are free to experiment with what works well (3, 4, 8 seconds between keepalives) without any need to revisit the spec.

Why would a keepalive need an id field?

So that a probe reception report can indicate
seeing a recent keepalive.

I see. But what about the case where there are no keepalives, only data packets? In that case, there's no id field either.

I believe that since the id of the last received probe is included,
the iseeyou flag is unnecessary.

But we also have the case where you report seeing data packets but
no probes.

Good point. But couldn't that be solved by using a special value for the last seen id?

But if you have ideas on how this could be simplified -- perhaps by
not thinking about the data packets during exploration -- those
would be welcome.

Way ahead of you - I wasn't thinking about data packets during exploration until now. :-)

Although copying back the last seen id seems to do the job, I can't
help but feel that it would be preferable to add timers to reach
round trip times and copy back more received ids and also sent ids.
This allows the receiver of a probe to determine which of the probes
that made it to the other side did so faster, so it can select the
address pair with the shortest round trip time.

Right. But all that can go into extensions. I'd like to have the
minimum necessary to get this spec done.

Let's split the difference and specify the fields, but make (most of) their use optional.

I severely dislike having fixed length data in TLV format, because that makes parsing much harder. If you look at Van Jacobson's work with TCP you'll see that a fixed header allows extremely streamlined implementations.

Including sent ids along with the addresses the probe with that id
was sent to helps the receiver determine that some probes didn't make
it (yet). If a probe didn't work in one direction of an address pair,
it's reasonable to assume that it may also not work in the other
direction and try other pairs first.

True as well, but again perhaps material for future
optimizations.

I really like having multiple ids for earlier probes in there, it should cut down on the number of packets exchanged, and I also suspect that there could be race conditions when certain packets are lost and not others so that an implementation that only echos the last seen id may stay in an oscillating state, but I haven't been able to think of an example so far.