[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Notes on draft-crocker-mast-analysis-01.txt



On 31 okt 2003, at 13:58, Mark Allman wrote:

There have been measurement studies conducted (e.g., Partiridge,
et. al. in Dec/1999 ToN) that show reordering is not as rare as one
might think.  And, furthermore, it can be fairly significant
reordering (i.e., not just swapping two packets).

I remember reading a study about reordering on MAE East, which also showed considerable reordering. However, IRRC this turned out to be mostly due to the parallel processing in the switches that were used. Today, router and switch vendors incorporate features to make sure that a "session" always flows over a single link. Sometimes it's even impossible to turn this off and the definition of a session is such that using two or more links in parallel doesn't work in practice because (nearly) all traffic is considered part of the same session.


It seems from reading these few messages I have seen that there are
really two issues with how reordering affects performance...

  * Implementation efficiency.  I.e., with reordering we have to
    hold, manage and process data in the receiver in ways that are
    different and less efficient (from what I hear) than the
    techniques used for in-order arrival.  This seems intuitive to
    me.  (Although, stack implementation is not my forte.)

Yes, processing packets out of order is going to take a bit more time. But I believe this is fairly minimal, assuming that upon reception, packets are copied to memory where they can sit for a while without causing problems. The processing required to handle this is completely insignificant compared to interrupt handling, memcopies and checksum calculations. (And if these are offloaded to the NIC, offloading reordering too isn't a big stretch.)


  * Algorithmic response.  As has been noted TCP's congestion
    control algorithms (RFC2581) are fairly intolerant to large
    amounts of reordering.  This is been shown in a number of
    studies (Blanton/Allman Jan/2002 CCR; Floyd, et. al.'s 2002 ICIR
    tech report; Bhandarkar, et.al.'s TCP-DCR work).  The
    mitigations proposed in those studies vary, but the general
    theme is that in networks where reordering can be detected the
    interpretation of duplicate ACKs as a loss signal should be made
    more conservative to accomodate the reordering.

Right. Problem solved. :-)


Implementations just need to change a bit to avoid unnecessary
fast retransmits if they want to support using parallel links. I
don't think this is a huge deal.

It might not be a "huge deal", but I do not think we have good
consensus on the right approach at this point.

Remember what Brian said: this stuff is probably going to be around for a long time, and today's limitations aren't necessarily going to be relevant in the future. So I think that we shouldn't introduce reordering if we can avoid it at no or little cost, but on the other hand not shy away from doing things that do so if this leads to significant advantages elsewhere.


One thing that bothers me about TCP and that also doesn't do us
any favors here is that it sends packet trains back-to-back all
the time.  This can at times be very harmful as it unnecessarily
increases burstiness. Especially cheap switches that must convert
between different ethernet speeds don't have the buffering to
handle this properly.

A different issue, but one I happen to be working on at the moment.

:-)


There is a paper that was presented at the Internet Measurement
Conference earlier this week that shows the impact of such
burstiness on the network as a whole.  I am more focused on a
narrower question that goes something like: If my TCP connection is
bursty, how does that impact my performance?


I'd be interested to hear about the "very harmful" behavior of TCP
in this regard.

Just recently I worked on a setup where we transported broadcast quality TV signals from one place to another over an IP network. The problem was that the devices we used don't support retransmissions, so each and every lost packet becomes a visible artifact. Another problem was that the encoder supports 100 Mbps ethernet but the decoder only 10 Mbps. So somewhere along the path there must be a conversion from 100 to 10 Mbps. Despite the fact that the average bitrate for the MPEG-2 stream was 7 Mbps or lower, we had all kinds of trouble with lost packets when we used switches (especially but not only with cheap unmanaged ones) for this. Fortunately, the problems went away when we let routers do the speed conversion.


It turned out that the encoder spat out packets at a rate of one every 160 to 250 microseconds for up to 3.5 milliseconds, and was then quiet for up to 40 milliseconds. However, at 10 Mbps it takes 900 microseconds to transmit one of these packets so one of those 3.5 ms bursts requires a buffering capacity of at least 10 packets. Apparently this is not something that switches consistently supply.

With TCP similar things will happen, although during normal operation TCP will usually only transit two packets back to back in response to an ack. However, there are numerous situations where TCP emits much larger packet trains, for instance, when the congestion window grows, when the application releases a lot of data after being quiet for a while, after recieving an ack again when one or more previous ones were missed and so on.

So this means TCP reacts much more violently to lost packets than expected, because the resulting packet trains trigger more congestion. But the real fun starts when running many TCP sessions at the same time and they start to synchronize. At some point this will start happening automatically when there is enough traffic, but there are also applications that have many TCP sessions that all receive data at the same time because of events in the application. For example, when I ran an Gnutella peer to peer file sharing client and I became an "ultrapeer", it was the job of my computer to reflect incoming requests out to 30 other systems. So that's at least 30 packets (from different TCP sessions) in quick succession. I experienced significant packet loss despite the fact that my ADSL line wasn't even close to 100% utilization.

I think the solution to these problems would be for TCP to traffic shape both individual sessions and the aggregate traffic from different sessions flowing over the same path. This would have two additional advantages: less buffer utilization along the way, and lower delays. This in turn leads to better RTT measurements which should improve recovery from lost packets. The price would be more processing as TCP would have to wake up more often to transmit smaller amounts of data.

The bandwidth / rate limit for an individual session wouldn't be too hard to figure out, especially if one end is configured with this information. Doing someting similar across sesions would be harder.

(And, offline may be best ... I must confess that I
am not sure how any of this relates to multi-homing and IPv6 and I
may be way off-topic here myself.  If so, I'm certainly sorry.)

Yes, if this discussion continues we should take it off-list. However a good understanding of TCP is important to multihoming :-) so I decided to reply on-list this time.