[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RRG] IPTM PMTUD with only DF=0 packets



Here are some thoughts on how I may change IPTM:

  http://www.firstpr.com.au/ip/ivip/pmtud-frag/

to handle the following situation:

  The only traffic packets longer than ~1200 bytes which the ITR
  needs to handle are IPv4 packets with DF=0.

If there was various length packets DF=1 longer than 1200 bytes,
then the ITR would create its three variables and quickly tune them
to become reliable and useful estimates of Real PMTU to this ETR.

Then, LPME would generally be higher than 1200 bytes, and is quite
likely to be close to, or exactly at, the Real PMTU limit.  For
instance, in a typical DFZ path with no extra tunnels, LPME would be
1500 and so the ITR could encapsulate traffic packets of 1480 bytes
and be confident they would get to the ITR without any fuss.

Likewise, perhaps there is a jumbo-frame clear path to the ETR.
Assuming the SH was also capable of sending DF=1 jumbo-frame packets
and properly explored sending them at the largest possible size,
then the first long such packet would have failed with a PTB because
it was longer  (9000 + 20 bytes of ENCAPS) than the ITR's outgoing
interface's MTU of 9000.  The ITR would have sent the SH a PTB with
MTU = 8980 and the SH would would have sent the same data in another
another packet of 8980 bytes - which after adding 20 bytes ENCAPS,
would be tunneled with IP-in-IP to the ETR without any problems.

But as currently described, IPTM only uses probes the PMTU to this
ETR with longer traffic packets if they are IPv6 or IPv4 DF=1.


Here is an algorithm for coping well when the only packets longer
than ~1200 bytes are IPv4 DF=0.  The ITR can't send back a PTB to
the SH for these packets.  It has to deal with them as best it can.

One approach would be for the ITR to perform heroics such as by an
expensive, two-way, protocol with the ETR to break the traffic
packet into smaller pieces, send them in different tunnel packets -
resending those packets which were not received by the ETR.

However, this would be excessive and unwarranted.  The SH made a
decision to burden the network with an unspecified amount of
fragmentation effort.  The network never promised to do this without
increased packet loss - so the network (including the ITR) should
not go to great effort to get this packet to the DH, beyond the use
of normal IP fragmentation techniques.

I tend to think that if the SH puts out so many packets with DF=0 as
to potentially cause wrap-around with late packets having the same
16 bit fragment ID as an earlier fragment, then so-be-it.  The SH
knew the rules and chose to risk its data like this.  It could
always be a good RFC 1191 compliant host and fine-tune its packets
to whatever size the network can support without fragmentation.

It could be argued that while excessive fragmentation in the network
is undesirable, it is the SH's job to stop this, not the network's
job to make fragmentation work better.


The new algorithm would be something like this.

When a DF=0 packet arrives which is longer than the current value of
 LPME, fragment it into as many pieces as are required so each
fragment after IP-in-IP encapsulation, has LPME bytes.

Then send these to the ETR.  The DH will reassemble the packets.

(If a router between the ETR and the DH has a lower MTU than the
fragment size, then that router will turn each fragment into two or
more fragments.  This makes for less reliability, but that is the
risk the SH took with DF=0.)

If LPME has not been adjusted from its initial value (1200 bytes)
and if these longer DF=0 packets continue to arrive, the ITR should
create some Synthetic Probe packets so it can adjust LPME upwards
towards - and ideally exactly to - the Real PMTU to this ETR.

I have been able to avoid Synthetic Probe packets apart from this,
but I think they are needed here.  I think it is more important to
get the DF=0 packets to the DH with whatever fragmentation is
required according to the currently unchanged low value of LPME than
to create delays and complication trying the ITR's luck with sending
parts of the traffic packet (or the whole thing) via RPD2 as part of
using traffic packets as probes.

So while these DF=0 traffic packets continue to arrive, the ITR
should probe the PMTU to the ETR.  Perhaps the ITR should send one
Synthetic Probe with RPD2 for each traffic packet.

The first would be like this.  The ITR generates the longest
Synthetic Probe with RPD2:

  Length = IMTU = MTU of the interface which leads to this ETR

and sends it to the ETR with RPD2's accompanying Packet As.

If it arrives OK and the ETR reports this to the ITR, then the ITR
discovers with a single RPD2 exchange that the Real PMTU is
unencumbered by any routers in the tunnel beyond the MTU of its next
hop.  Then, LPME will be set to this size, and fragmentation of DF=0
packets will occur according to this size, which is optimally
efficient and no-doubt better than sticking with the initial LPME
value of 1200.


If this initial probe generates a PTB, then that is the new value of
UPME and the size of the next attempt, to be done after the next
DF=0 packet which needs fragmentation.  Fragmentation continues
according to the the currently unchanged value of LPME: 1200 bytes.

The two less convenient outcomes for the first probe:

  1 - No PTB arrives and the ETR reports that the probe packet
      (Packet B) did not arrive.

  2 - No PTB and no response from the ETR.

I will think more about how to handle these.  We need to ensure the
ITR doesn't keep on sending huge packets to the ETR address.  If it
did, then it would be smurf amplifier.  An attacker could set the
mapping of their micronet to some target which is not an ETR and by
sending DF=0 packets of 1200 bytes to the ITR, with a destination
address which is mapped to this target address, the attacker could
induce a jumbo-frame capable ITR to fire 9000 byte probe packets for
each such 1200 byte packet.

In most ordinary cases, one or a few Synthetic Probe packets will
quickly adjust LPME to the Real PMTU value, and then the ITR will be
optimally efficient by fragmenting DF=0 packets and sending them
with ordinary encapsulation so they are this length.

  - Robin


--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg