[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] Perplexing PMTUD and packet length observations

To: Robin Whittle <rw@firstpr.com.au>
Subject: Re: [RRG] Perplexing PMTUD and packet length observations
From: Iljitsch van Beijnum <iljitsch@muada.com>
Date: Tue, 12 Aug 2008 17:23:58 +0200
Cc: Routing Research Group <rrg@psg.com>
In-reply-to: <48A1965F.3090406@firstpr.com.au>
References: <48A06F15.6040809@firstpr.com.au> <3F3B7EE7-01DC-45FA-BF8B-CA74D99838DE@muada.com> <39C363776A4E8C4A94691D2BD9D1C9A104D5D516@XCH-NW-7V2.nw.nos.boeing.com> <0A570730-317F-4336-88BA-A89A9EA2C57E@muada.com> <48A15F8B.7080707@firstpr.com.au> <4B8F9677-4F18-418F-A4E9-6A07BA3BFF83@muada.com> <48A1965F.3090406@firstpr.com.au>

On 12 aug 2008, at 15:55, Robin Whittle wrote:

If not, I'd say that all of this is a bug in the linux networking
code (which is weird to begin with)

I can't imagine it is a bug.

Well the Linux people do stuff that other people call bugssometimes... IMO, limiting the amount of data but not the number ofpackets as per the correspondent's MSS is not an appropriate way to doTCP congestion control.

It is conceivable it is http doing this,

Not really. HTTP needs to go through TCP so it has to be TCP.

However, Linux has a "sendfile" syscall that does what you imagine itwould, so maybe it's that code that does this. You may want to trywith a static file and a script that creates output that is largeenough to see if that's the case.

I have no explanation for either of these things - the server
bundling together TCP data in flagrant violation of the RFCs as I
understand them, and (as best I can guess) the PPPoE router taking
it upon itself to recreate the individual RFC conformant TCP packets.

Hm, I'd still like to see if the Linux box actually spits outjumboframes or not from another system. What do you have your MTU setto?

Its not in the subject lines, and there is no search facility.

Hm, it wasn't so recent apparently, and ack on the quality of thearchive:

http://readlist.com/lists/trapdoor.merit.edu/nanog/7/35484.html

MSS is end-to-end, you still need PMTUD or fragmentation.

Yes - and with Google sending out large packets with DF=0, it is
expecting any hapless router in the middle, with a lower next hop
MTU than this length, to do a lot of work without complaint.

Such are the perils of implementing RFC 791.

Fragmentation CAN be done in hardware. Not sure how much gear doesthat. But you can always get around this by configuring all your stuffwith the same MTU. (Your ISPs may not want to go along with that,though.)

Indeed. Still, it probably has to be done at some point,
especially if we ever want to move away from 1500 as the
internet's maximum packet size.

We should all stop burning fossil fuel at some point too.

The difference is that burning oil is cheaper than the alternative,while breaking connectivity and sending packets smaller than what thehardware is capable of isn't.

It would be nice if the sending host had a better clue about the
outside world than the simple fact that its Ethernet link has an MTU
of 9k or so.

There are two ways to accomplish this:

1. ask someone who knows
2. measure

MTU information isn't known so there's noone to ask (could be added torouting protocols, though, but apparently few people cared enough todo that), so the only alternative is to send packets and observe whatcomes back.

Set it only for 10% of your packets and you still have
connectivity when there is a black hole and the PMTUD works just
fine.

OK - so routers would fragment 90% of the packets and the PTB only
goes back when one of the 10% of packets has its DF flag set?

That just seems to slow down the sending host's response to the MTU
situation.

Yes, that's what the RFC 1191 people keep saying.

Apparently there's a disconnect between spec writers and implementerson the one hand and people who have to debug connectivity problems onthe other hand, with the clueless firewall admins living in a bubbledisconnected from everything.

I still like RFC 1191 better.  There's no fragmentation and the
sending host gets the fastest possible feedback that it needs to
send smaller packets.

I'll take reliable over fast.

Removing fragmentation from the network is a really good aspect
of IPv6, I think.  Ideally, I think, all packets should be sent
DF=1 and all applications should be ready to cope

No. This is a layer 3 job, not a layer 7 job.

I don't understand this.

Application writers can't be trusted with the internals of the network.

Unfortunately, the surviving packet fragment isn't much use
to the destination host, so it still takes 1.5 RTTs to get the data
there.  Still, that is better than 3.5 RTTs with RFC 1191.

You can still use the data, except that you can't check its integritybecause the checksum is now incorrect. So the semi-ACK asks the otherside to send just the checksum over the data that was correctlyreceived. Did I forget to explain this part?

My understanding of this is that if all hosts have a next hop MTU of
1500, and the core has an MTU of 9000, then it is no problem if the
destination network blocks PTBs from leaving that network since no
host would be sending packets bigger than 1500 anyway.

Your premise is invalid so the conclusion is meaningless.

Actually, it would be interesting to do some research into the MTUdistribution across the internet.

But plenty of servers - probably most by now - have gigabit ethernet
and so have a real PMTU for most of the core, and into quite a few
edge networks, of 9k or so.

Note that although the 9000-byte jumboframe capability is common,there are also very many implementations that use different sizes soit's impossible to standardize on anything, even if you could ignorethe fact that the current internet expects 1500.

Also, because of the 802.3 spec, the jumboframe capability must beenabled administratively, and because of the IP-over-ethernet specs,all hosts on a subnet must use the same MTU, so basically deploymentis impossible. (This is what my draft addresses.)

When they send a packet to some edge network with 1500 MTU links,
which blocks the PTBs which should go back to the sending host, then
there is a black hole.

You mean: a network that doesn't generate them in the outgoingdirection?

In practice this won't be a problem because few people will connect tothe internet with a 1500+ MTU and then not generate too bigs. Sincerouters generate them out of the box and ISPs usually don't havefirewalls in the middle of their networks and don't like supportcalls, ISPs tend to generate them.

If you use an MTU bigger than the standard 1500, then you shootyourself in the foot with ICMP filtering so you're not likely to doboth. The trouble is mainly with using a smaller MTU: then the problemis caused by _other_ people not listening to _your_ too bigs and thereis little that you can do.

I guess the majority of websites now can send jumboframes, like my
server can.

That doesn't mean that all the stuff in the middle can handlejumboframes. The core of the network generally can, but the stuffaround the edges, like the cheap switches that connect dozens ofservers like yours, are likely to only support small packet sizes,either 1500 or "mini jumbos" of 1500 - 2000 bytes.

I am not sure where the MSS is configured.

It isn't. The MSS option is created from the MTU of the interface thedestination address is reachable through.

Yeah, none for "RFC 791 deployment" either...

Touché - but where is the evidence of applications and operating
systems actually implementing RFC 4821?

I haven't seen any...

Is there any site, any
working group or whatever where this is discussed?


Not sure if this is officially dead or not:

http://staff.psc.edu/mathis/MTU/

"Just remember: The glass is neither half full nor half empty, it ismerely the wrong size."

The optimist says the glass is half full. The pessimist says it's halfempty. The engineer says the glass is twice as big as it needs to be.




--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg

Follow-Ups:
- Re: [RRG] Perplexing PMTUD and packet length observations Oops: TSO
  - From: Robin Whittle <rw@firstpr.com.au>

References:
- [RRG] Perplexing PMTUD and packet length observations
  - From: Robin Whittle <rw@firstpr.com.au>
- Re: [RRG] Perplexing PMTUD and packet length observations
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- RE: [RRG] Perplexing PMTUD and packet length observations
  - From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
- Re: [RRG] Perplexing PMTUD and packet length observations
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- Re: [RRG] Perplexing PMTUD and packet length observations
  - From: Robin Whittle <rw@firstpr.com.au>
- Re: [RRG] Perplexing PMTUD and packet length observations
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- Re: [RRG] Perplexing PMTUD and packet length observations
  - From: Robin Whittle <rw@firstpr.com.au>

Prev by Date: Re: [RRG] Perplexing PMTUD and packet length observations
Next by Date: Re: [RRG] Re: Does every host need a FQDN name in the future?//re:[RRG] draft-rja-ilnp-intro-01.txt
Previous by thread: Re: [RRG] Perplexing PMTUD and packet length observations
Next by thread: Re: [RRG] Perplexing PMTUD and packet length observations Oops: TSO
Index(es):
- Date
- Thread