[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] PMTUD, Sprite & IPTM; Outer src-addr = sending host's addr



On 23 nov 2007, at 8:26, Robin Whittle wrote:

In my opinion, that's not the best way to handle this.
First of all, operators need to make sure that they use decent
MTUs after tunneling so that hosts that send 1500-byte packets
won't get in trouble or lead to much, if any, MTU discovery.

I think it may be reasonable to insist on something like "Every ITR
and ETR must be on a link with at least an XXX MTU to the core of
the Net", but XXX needs to be below 1500 for a variety of reasons

No.

We're doing something new here, which gives us the opportunity to get rid of at least SOME old crap. If you want to run a *TR, get gear that can handle 1600 byte packets. Period.

Encapsulation in DSL links is one.

Oh really?

#sh int atm0.1
ATM0.1 is up, line protocol is up
  Hardware is PQUICC_SAR (with Alcatel ADSL Module)
  Interface is unnumbered. Using address of Ethernet0 (82.192.90.25)
  MTU 4470 bytes, BW 800 Kbit, DLY 80 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ATM
  54222372 packets input, 11781520984 bytes
  54403498 packets output, 16978237812 bytes
  0 OAM cells input, 0 OAM cells output
  AAL5 CRC errors : 0
  AAL5 Oversized SDUs : 0
  Last clearing of "show interface" counters never
#ping
Protocol [ip]:
Target IP address: 82.192.91.254
Repeat count [5]:
Datagram size [100]: 4000
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface:
Type of service [0]:
Set DF bit in IP header? [no]: yes
Validate reply data? [no]: yes
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 5, 4000-byte ICMP Echos to 82.192.91.254, timeout is 2 seconds:
Packet sent with the DF bit set
Reply data will be validated
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 72/73/76 ms

GRE or some other tunneling is another

No. GRE, like any other *-in-IP mechanism, adds some extra bytes, but if you start with something big enough you can add a good number of GRE headers before you end up below 1500 bytes.

What you really mean is that there is some ethernet stuff in the middle that doesn't support jumboframes. Since as good as all gigabit ethernet equipment supports jumboframes these days in practice, this means that someone is being too cheap to upgrade their stuff to gigabit ethernet and/or doesn't want to go through the trouble of manually setting the MTU. I don't find this sufficient reason to cripple our efforts.

(Sitting behind a link with an IPv6 PMTUD black hole doesn't help me being tolerant here.)

Note that these days pretty much all TCP stacks set the DF bit and a very significant number of sites filters ICMP too big messages. That means that if you're behind a < 1500 MTU link you WILL have a LOT of trouble unless you work around that by clearing the DF bit or adjust the TCP MSS option, both of which are easy to do in a local gateway but much harder in the core of the network.

As far as I know, we can't reasonably insist that any operator
provide an MTU above 1500.

Not all operators or even any operator, just the ones that want to run *TRs.

For instance, I think that
in quite a few circumstances it would be best to have the ITR
function (a caching ITRC function) in the sending host.

Well, if you do that you have to use good security mechanisms because that way you're sure that ITRs are in the hands of bad people. If you only have them under the control of ISPs the security mechanisms can be much simpler because if your ISP is out to get you, they'll get you without subverting an ITR anyway.

Second, tunnel endpoints should simply implement two sides of
path MTU discovery: they should discover the maximum packets they
can successfully send to the other endpoint, and they should set
their own inbound MTU to that value minus the size of the header
that's added.


This is what Fred's and my approaches are trying to achieve.

Doesn't look that way to me...

However, when sending the first packet to an ETR, there is no time
for the ITR to muck around testing the PMTU limit before sending the
packet.  So I suggest sending it if it is shorter than some assumed
limit (eg. 1280) and fragmenting it if it is longer - irrespective
of whether its do not fragment bit is set.

That is a really bad solution, because this guarantees a good amount of fragmenting. With IPv4, this is rather problematic because of the small ID space. It also costs you lots of CPU and could even allow for CPU exhaustion attacks.

A few lost packets here or there because of PMTUD aren't the end of the world; just keep things simple.

Later, if more such
packets need to be sent, the ITR and ETR can work on determining the
real PMTU.  I do this with probe packets, rather than traffic packets.

Even more overhead...

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg