[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RRG] PMTUD, Sprite & IPTM; Outer src-addr = sending host's addr
On 23 nov 2007, at 8:26, Robin Whittle wrote:
In my opinion, that's not the best way to handle this.
First of all, operators need to make sure that they use decent
MTUs after tunneling so that hosts that send 1500-byte packets
won't get in trouble or lead to much, if any, MTU discovery.
I think it may be reasonable to insist on something like "Every ITR
and ETR must be on a link with at least an XXX MTU to the core of
the Net", but XXX needs to be below 1500 for a variety of reasons
No.
We're doing something new here, which gives us the opportunity to get
rid of at least SOME old crap. If you want to run a *TR, get gear that
can handle 1600 byte packets. Period.
Encapsulation in DSL links is one.
Oh really?
#sh int atm0.1
ATM0.1 is up, line protocol is up
Hardware is PQUICC_SAR (with Alcatel ADSL Module)
Interface is unnumbered. Using address of Ethernet0 (82.192.90.25)
MTU 4470 bytes, BW 800 Kbit, DLY 80 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ATM
54222372 packets input, 11781520984 bytes
54403498 packets output, 16978237812 bytes
0 OAM cells input, 0 OAM cells output
AAL5 CRC errors : 0
AAL5 Oversized SDUs : 0
Last clearing of "show interface" counters never
#ping
Protocol [ip]:
Target IP address: 82.192.91.254
Repeat count [5]:
Datagram size [100]: 4000
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface:
Type of service [0]:
Set DF bit in IP header? [no]: yes
Validate reply data? [no]: yes
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 5, 4000-byte ICMP Echos to 82.192.91.254, timeout is 2 seconds:
Packet sent with the DF bit set
Reply data will be validated
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 72/73/76 ms
GRE or some other tunneling is another
No. GRE, like any other *-in-IP mechanism, adds some extra bytes, but
if you start with something big enough you can add a good number of
GRE headers before you end up below 1500 bytes.
What you really mean is that there is some ethernet stuff in the
middle that doesn't support jumboframes. Since as good as all gigabit
ethernet equipment supports jumboframes these days in practice, this
means that someone is being too cheap to upgrade their stuff to
gigabit ethernet and/or doesn't want to go through the trouble of
manually setting the MTU. I don't find this sufficient reason to
cripple our efforts.
(Sitting behind a link with an IPv6 PMTUD black hole doesn't help me
being tolerant here.)
Note that these days pretty much all TCP stacks set the DF bit and a
very significant number of sites filters ICMP too big messages. That
means that if you're behind a < 1500 MTU link you WILL have a LOT of
trouble unless you work around that by clearing the DF bit or adjust
the TCP MSS option, both of which are easy to do in a local gateway
but much harder in the core of the network.
As far as I know, we can't reasonably insist that any operator
provide an MTU above 1500.
Not all operators or even any operator, just the ones that want to run
*TRs.
For instance, I think that
in quite a few circumstances it would be best to have the ITR
function (a caching ITRC function) in the sending host.
Well, if you do that you have to use good security mechanisms because
that way you're sure that ITRs are in the hands of bad people. If you
only have them under the control of ISPs the security mechanisms can
be much simpler because if your ISP is out to get you, they'll get you
without subverting an ITR anyway.
Second, tunnel endpoints should simply implement two sides of
path MTU discovery: they should discover the maximum packets they
can successfully send to the other endpoint, and they should set
their own inbound MTU to that value minus the size of the header
that's added.
This is what Fred's and my approaches are trying to achieve.
Doesn't look that way to me...
However, when sending the first packet to an ETR, there is no time
for the ITR to muck around testing the PMTU limit before sending the
packet. So I suggest sending it if it is shorter than some assumed
limit (eg. 1280) and fragmenting it if it is longer - irrespective
of whether its do not fragment bit is set.
That is a really bad solution, because this guarantees a good amount
of fragmenting. With IPv4, this is rather problematic because of the
small ID space. It also costs you lots of CPU and could even allow for
CPU exhaustion attacks.
A few lost packets here or there because of PMTUD aren't the end of
the world; just keep things simple.
Later, if more such
packets need to be sent, the ITR and ETR can work on determining the
real PMTU. I do this with probe packets, rather than traffic packets.
Even more overhead...
--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg