[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] PMTUD, Sprite & IPTM; Outer src-addr = sending host's addr

To: Robin Whittle <rw@firstpr.com.au>
Subject: Re: [RRG] PMTUD, Sprite & IPTM; Outer src-addr = sending host's addr
From: Iljitsch van Beijnum <iljitsch@muada.com>
Date: Fri, 23 Nov 2007 14:35:37 +0100
Cc: Routing Research Group list <rrg@psg.com>
In-reply-to: <4746809F.5020604@firstpr.com.au>
References: <4746809F.5020604@firstpr.com.au>

On 23 nov 2007, at 8:26, Robin Whittle wrote:

In my opinion, that's not the best way to handle this.
First of all, operators need to make sure that they use decent
MTUs after tunneling so that hosts that send 1500-byte packets
won't get in trouble or lead to much, if any, MTU discovery.

I think it may be reasonable to insist on something like "Every ITR
and ETR must be on a link with at least an XXX MTU to the core of
the Net", but XXX needs to be below 1500 for a variety of reasons

No.

We're doing something new here, which gives us the opportunity to getrid of at least SOME old crap. If you want to run a *TR, get gear thatcan handle 1600 byte packets. Period.

Encapsulation in DSL links is one.

Oh really?

#sh int atm0.1
ATM0.1 is up, line protocol is up
  Hardware is PQUICC_SAR (with Alcatel ADSL Module)
  Interface is unnumbered. Using address of Ethernet0 (82.192.90.25)
  MTU 4470 bytes, BW 800 Kbit, DLY 80 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ATM
  54222372 packets input, 11781520984 bytes
  54403498 packets output, 16978237812 bytes
  0 OAM cells input, 0 OAM cells output
  AAL5 CRC errors : 0
  AAL5 Oversized SDUs : 0
  Last clearing of "show interface" counters never
#ping
Protocol [ip]:
Target IP address: 82.192.91.254
Repeat count [5]:
Datagram size [100]: 4000
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface:
Type of service [0]:
Set DF bit in IP header? [no]: yes
Validate reply data? [no]: yes
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 5, 4000-byte ICMP Echos to 82.192.91.254, timeout is 2 seconds:
Packet sent with the DF bit set
Reply data will be validated
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 72/73/76 ms

GRE or some other tunneling is another

No. GRE, like any other *-in-IP mechanism, adds some extra bytes, butif you start with something big enough you can add a good number ofGRE headers before you end up below 1500 bytes.

What you really mean is that there is some ethernet stuff in themiddle that doesn't support jumboframes. Since as good as all gigabitethernet equipment supports jumboframes these days in practice, thismeans that someone is being too cheap to upgrade their stuff togigabit ethernet and/or doesn't want to go through the trouble ofmanually setting the MTU. I don't find this sufficient reason tocripple our efforts.

(Sitting behind a link with an IPv6 PMTUD black hole doesn't help mebeing tolerant here.)

Note that these days pretty much all TCP stacks set the DF bit and avery significant number of sites filters ICMP too big messages. Thatmeans that if you're behind a < 1500 MTU link you WILL have a LOT oftrouble unless you work around that by clearing the DF bit or adjustthe TCP MSS option, both of which are easy to do in a local gatewaybut much harder in the core of the network.

As far as I know, we can't reasonably insist that any operator
provide an MTU above 1500.

Not all operators or even any operator, just the ones that want to run*TRs.

For instance, I think that
in quite a few circumstances it would be best to have the ITR
function (a caching ITRC function) in the sending host.

Well, if you do that you have to use good security mechanisms becausethat way you're sure that ITRs are in the hands of bad people. If youonly have them under the control of ISPs the security mechanisms canbe much simpler because if your ISP is out to get you, they'll get youwithout subverting an ITR anyway.

Second, tunnel endpoints should simply implement two sides of
path MTU discovery: they should discover the maximum packets they
can successfully send to the other endpoint, and they should set
their own inbound MTU to that value minus the size of the header
that's added.

This is what Fred's and my approaches are trying to achieve.

Doesn't look that way to me...

However, when sending the first packet to an ETR, there is no time
for the ITR to muck around testing the PMTU limit before sending the
packet.  So I suggest sending it if it is shorter than some assumed
limit (eg. 1280) and fragmenting it if it is longer - irrespective
of whether its do not fragment bit is set.

That is a really bad solution, because this guarantees a good amountof fragmenting. With IPv4, this is rather problematic because of thesmall ID space. It also costs you lots of CPU and could even allow forCPU exhaustion attacks.

A few lost packets here or there because of PMTUD aren't the end ofthe world; just keep things simple.

Later, if more such
packets need to be sent, the ITR and ETR can work on determining the
real PMTU.  I do this with probe packets, rather than traffic packets.

Even more overhead...

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg

Follow-Ups:
- Re: [RRG] PMTUD, Sprite & IPTM; Outer src-addr = sending host's addr
  - From: Robin Whittle <rw@firstpr.com.au>

References:
- [RRG] PMTUD, Sprite & IPTM; Outer src-addr = sending host's addr
  - From: Robin Whittle <rw@firstpr.com.au>

Prev by Date: [RRG] PMTUD, Sprite & IPTM; Outer src-addr = sending host's addr
Next by Date: [RRG] Re: A new draft about Hierarchical Routing Architecture
Previous by thread: [RRG] PMTUD, Sprite & IPTM; Outer src-addr = sending host's addr
Next by thread: Re: [RRG] PMTUD, Sprite & IPTM; Outer src-addr = sending host's addr
Index(es):
- Date
- Thread