[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: review of draft-ietf-shim6-failure-detection-03.txt



On 22-jun-2006, at 16:59, marcelo bagnulo braun wrote:

This draft works per-context exclusively. So if there are 2, 5 or 10 contexts between two hosts, this means 2, 5 or 10 times the amount of work is done.

i agree that this would be a nice feature. the problem with this is how do you identify the peer in such a way that you can probe all the existing contexts.

Have a look at the revision of my reachability detection draft:
http://www.muada.com/drafts/draft-van-beijnum-shim6-reach-detect-00.txt

Note that this is an update of draft-ietf-shim6-reach-detect-01.txt and it's not yet posted by the secretariat.

The other option would be to use a single probe/keepalive for all the contexts between two peers. In order to do that we need a mean to identify the peer so that the receiver of the packet can identify all the contexts corresponding to the same hosts and apply the received packet to all the contexts.

Indeed.

BAsically this would introduce the notion of endpoint in the shim context/protocol (which is not present today), since today the granularity is ulid pairs (as oposed to endpoint pairs)

Not necessarily, read my draft.

this would be a considerable change in the protocol i guess, but may be explored if people deem it relevant.

It makes the protocol a bit more complex, but it does allow it to be used by many different protocols at the same time.

As a general comment, i am kind of worried about the complexity of the resulting protocol, including shim protoc and the failure detection protocol and i would really preffer to try to simplify the protocol rather than making it more complex, even if this means loosing some optimization for some cases.

I suppose the case where there are multiple contexts between two host won't be that common that it's worth too much effort to deal with it. But if other protocols also need this, then it would be MUCH better to have a single code base that's shared by all of them rather than have essentially the same thing pop up in different places.

I am concerned about having a complex protocol that may become error prone (we already have feedback expressing this concern BTW)

I hate complexity as much as the next IETFer, but leaving the last 10% out just because it's simpler is generally not a good solution.

However, it's important that there is fate sharing between the reachability protocol and the user protocol (shim in our case). I think this can be solved by having the quick reachability verification stuff (= FBD) encapsulated in the user protocol, but let the full path exploration be a protocol of its own or live under ICMPv6 or some such.

not sure why do you think this is needed. Defining the protocol messages in a way that they can be included in the shim6 header as well as in the mobility header or the hip header would be good enough to allow using the failure detection protocol in other protocols.... what am i missing?

See the discussion above, and the need for fate sharing between the reachability protocol and the "user" protocol. If we want the reachability detection to be shared by different users, then it can happen that one protocol is filtered and another isn't. So we probably want the reachability detection to be independent of the "user" protocols and then when the reachability protocol says that something is reachable, the user protocol does a quick check using its own protocol number to be sure it actually works.

Another thing that's missing completely from this draft is a discussion of how to use address pair preference information. This makes it impossible to address traffic engineering needs.

well, i have been working on this and i have submitted a draft about how to perform locator pair selection, including reachability information and also preference information from the shim protocol

you can find it at:

http://www.ietf.org/internet-drafts/draft-ietf-shim6-locator-pair- selection-00.txt

of course your feedback would be very welcome

I'll have a look at it.

i think that the definition section is very useful, because the insight it provides about the different states of an address and address pairs are very important.

I agree, but my problem with the definition section is that it contains too much stuff that shouldn't be there. It's not unusual to have to go back to the definition section several times during reading, so a definition section needs to be as concise as possible.

I suggest tightening the use of words like "operational", "work", "reachable". They're mostly used interchangably in the draft.

i don't think this is the case.
i find this differences relevant imho

I'm not sure there is a difference, and if there is, what it is...

This doesn't say what shim6 implementers should do. In my opinion: keep using deprecated addresses as the ULID/primary locator as long as possible, but prefer non-deprecated addresses when selecting alternative locators.

i think this should belong to the locator selection document...

Is that a separate document???

   2.  Whenever outgoing data packets are generated

Data packets as opposed to what other types of packets?

signalling packets, such as keeplives or probes (is my understanding)

Sure, but the draft doesn't say that.

   4.  The reception of a REAP keepalive packet leads to stopping the
       timer associated with the return traffic from the peer.

So when we receive a keepalive from the other side, _we_ stop sending keepalives

as i understand it, this means that we are not expecting another packet (until we send a new packet, of course)

I guess. But shouldn't this follow from the general rules rather than be a specific one?

The keepalives are sent at an interval of 3 seconds (or shorter, I imagine that an implementation isn't going to keep an exact timer for each context, any rounding must obviously be in the down direction) and the timeout is 10 seconds. In these 10 seconds you'd normally receive 3 keepalives, while 1 is enough to indicate that the other side is still alive. The other 2 are only there in case of packet loss. I think that's excessive.

would you suggest it to reduce it to 2 packets every 10 secs?

That's a bit better, but actually I think 1 in 10 seconds is enough, although that means you need to take a few extra seconds before you can time out. If you want to time out after 10 seconds then sending a keepalive after 8 would probably be a good choice.

I mean, i think this protocol will require quite a lot of fine tunning based on experience and simulations of the load... i guess that what's in the current spec are resonable values for the time being (i have no problem with changing them a bit, but as i said i guess in depth fine tunning will be needed once we have more experience...)

How is experience going to tell us anything that we don't know already in this case? If we go for one missed keepalive before a timeout that would be a new approach that may not work out well and then we can go back to 5 seconds or 3 seconds, but starting at 3 means a lot of packets but as good as no unnecessary triggering of path exploration, there won't be any surprises there.

I believe that since the id of the last received probe is included, the iseeyou flag is unnecessary.

you mean that if the id field is empty, this means iseeyou=no?

No, what I mean is that the value of this bit doesn't convey any interesting information.

Or maybe it really is a "reply requested" bit in disguise, like we discussed earlier.

Although copying back the last seen id seems to do the job, I can't help but feel that it would be preferable to add timers to reach round trip times and copy back more received ids and also sent ids. This allows the receiver of a probe to determine which of the probes that made it to the other side did so faster, so it can select the address pair with the shortest round trip time.

i would suggest to leave this for future work, since it is added complexity and it is not obvious to me that selecting the fastest one is always the best choice.... (e.g. bandwidth is not considered)

I'd say: put in the fields, this is very little extra work, and the values can be ignored for simplicity when desired. Then, implementers can experiement with how they use them if they like.

The keepalive is a fairly long packet. I think just a shim header as would be used for data packets but with no ULP following the shim header would be sufficient.

not sure what would you omit from the current packet format... i mean, we need the context tag and the identifier and we need it to make it extensible in the header....

No we don't. Data packets don't have these fields either and also indicate that the current context is working. Moreover: data packets that haven't been rewritten don't even have a shim header!

Requiring random numbers in packets that are sent rather frequently is a bad idea, because it depletes the typically limited amount of entropy that's available for strong random number generation rather quickly and semi-random number generation may be somewhat expensive (and not that good). And I don't see what good an id does in a keepalive anyway... Also, there may be reasons to have non-random numbers, such as ease of lookup.

i guess this i neeeded to indeed verify that the reply was generated as a response to the initial packet,

Keepalives are generated autonomously, not in response to other shim packets, so this is not relevant in this case.

I don't have a good feeling about this... It's too hard to determine what should be happening. Maybe it would be better rather than go down the list of packets that are sent/received and describe the behavior in each state, to take one state at a time and describe what happens with packets in that state.

that would be the state machine i guess, right?

I don't know.

Then I'm ignoring this too.

But I would be happier if they'd be removed, because either they're superfluous as they're not normative, or they're actually necessary to understand the protocol, which is even worse because they're not part of the normative text.

i think state machines are very useful to understand how the protocol works and to verify that it is working and i think these should be included in the docuemnts

Is it really not possible to express them in ASCII so they can be made part of the normative text?