[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tcpm] TCP, DCCP, v6, and ICMP soft errors [draft-ietf-v6ops-v6onbydefault-01]



At 20:39 17/07/2004 -0700, Eddie Kohler wrote:

It says:
---- cut here ----
4.1 Changing TCP's reaction to soft errors
As discussed in Section 1, it may make sense for the fault recovery
action to depend not only on the type of error being reported, but
also on the time the error is reported. For example, one could infer
that when an error arrives in response to opening a new connection,
it is probably caused by opening the connection improperly, rather
than by a transient network failure. [8]
---- cut here ----
That means, we modify TCP fault isolation policy on the connection establishment phase, rather than changing the "meaning" of the error itself.

The Host Requirements RFC codifies years of experience about whether or not particular errors are "soft" or "hard".

I'd disagree a bit here. The Host Requirements dates back to 1989. It considers, for example, "Fragmentation needed but DF bit set" as a hard error. However, this was letter considered a soft error, and used for the Path MTU Discovery Mechanism.


That doesn't mean it's not a great source of experience, but that you should not consider it as "religion".


That experience is still valid, so shifting an error from "soft" to "hard" (in terms of the response to that error) is problematic. Limiting the shift to two connection states doesn't conceptually change this.

I don't think implementing the proposed solution would lead to interoperability problems.


The only scenario in which you could get an undesired "effect" would be that in which you tried to connect to a destination host (which had only one IP address) while there was transient network problem (that would elicit ICMP unreacheables) that was short enough that some later TCP retransmission would succeed.

OTOH, this work-around could be enabled by default only when the destination port is that of an interactive service, such as the web. This is not discussed in the current draft, but I could include it if any of the WG participants thinks it would be convenient.

For interactive services the delays in connection establishment attempts would be unacceptable. And in the event there was an scenario as the one I described above (where some TCP retransmission would have succeeded), the *user* would manually trigger another connection retry.

Clark's paper has some discussion on all this.


But RFC 1122 also says that TCP "SHOULD make the information [about the soft error] available to the application". This shows a way out of the conflict.
IMHO, this was okay when applciation programmers were more knowledgeable about networking and protocol specifics.

Many still are.

That doesn't mean you must provide programmers an API that *requires* them to know about networking. (My point is just that requiring programmers to deal with protocol specifics would not be the best idea, *nowadays*).


Would you have a database programmer (or whatever) set an option to decide how to react to soft errors? I mean, would you have him deciding whether to do it or not?


Should we *nowadays* bother an *application* programmer with TCP soft/hard errors?
I don't think so.

I disagree pretty strongly. There are a couple APIs here.

While there *could* be more than one API, we have only one: Sockets. Most apps implement all the same name resolution code, address cycling code, etc., as there not such a Higher-Level API.



The most fundamental is kernel/userlevel, and that's what I think we're discussing.

Not sure why you stress on "userlevel". A network connection is an IPC mechanism, after all.



Kernel/userlevel APIs should provide maximum flexibility for advanced applications, using simple techniques; anything else leads to problems later. Less advanced apps can rely on userlevel libraries to make things easier.

I'd say that you'd *need* a low level API just when playing wclosely to the network.
Just because the API is more abstract does not mean it's least powerful.



The existing v6onbydefault draft already talks about such userlevel libraries, and you did in your mail as well.

Yes. Pekka triggered some discussion on this issue in March. And I proposed that of the Higher-Level API. BTW, my point was that having application such a low level API is what actually causes all this trouble. You have *applications* bothering with how a transport protocol reacts to network-layer errors. Summing-up: Make apps use a more abstract API, and the next time you have any of this type of "problem", you'll be able to do whatever you want (or so) without affecting the applications.



This change is more incremental than the change currently proposed, and less invasive to the stack (no worries about SYN-SENT/SYN-RECEIVED state and so forth).
Not sure what you mean.

"Invasive" was a poor choice of words. My main concern is that a problematic change is being justified by limiting its scope -- but the wrong scope limitation was chosen.

What is being proposed is to modify TCP's fault isolation policy at the connection establishment phase, in the presence of soft errors. *That* is what (I think) must be analyzed. i.e.: Is the proposed policy acceptable? What are the possible drawbacks? etc.



Applications depend on the current dest-unreachable-is-soft behavior, including during connection initiation.

There are already soft error conditions that are treated as hard ones. IIRC, POSIX allowed hosts to respond with an RST when a connection request was received and the listening queue was full. IIRC, Microsoft stacks behave that way. So, from an interoperability point of view, one could say that you should be prepared to handle this type of scenario. (No, I don't agree with responding with an RST when the listening queue is full. I'd just silently ignore the SYN).



I think the correct way to limit the scope of this change is to make the application request it,

Then you must have all applications add extra code to solve this problem.




rather than applying the change to all applications in a subset of TCP states.

Why should you apply the change in a "all or nothing" fashion?



If the app requests the new behavior, we know the app is prepared to handle that behavior.

Having programmers be aware of the new option, etc., will take time. Enabling the fix in a "by default" basis would be a shortcut to this.


Thanks again for your comments!


-- Fernando Gont e-mail: fernando@gont.com.ar || fgont@acm.org