[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: transmech MTU comments



Erik,

I would like to discuss some aspects of the current text in [MECH], section 3:

3.2.1. Static Tunnel MTU

  A node using a static tunnel MTU MUST limit the size of the IPv6
  packets it tunnels to 1280 bytes i.e., treat the tunnel interface as
  having a fixed interface MTU of 1280 bytes.  An implementation MAY
  have a configuration knob which can be used to set a larger value of
  the tunnel MTU than 1280 bytes, but if so the default MUST be 1280
  bytes.  A larger fixed MTU should not be configured unless it has
  been administratively ensured that the decapsulator can reassemble
  packets of that size.  Care should be taken when manually configuring
  large tunnel MTUs to only do so when the MTU of the IPv4 path to the
  tunnel endpoint is large to avoid causing excessive fragmentation.

  When using the static tunnel MTU the Don't Fragment bit MUST NOT be
  set in the encapsulating IPv4 header.  As a result the encapsulator
  should not receive any ICMPv4 "packet too big" message as a result of
  the packets it has encapsulated.


The latter paragraph implies that either all links on the path will be at least as large as the static MTU, or that nodes with constricting links will use IPv4 fragmentation to split the packet into pieces small enough to traverse the constricting link. The former case will not be true in general, because certain bandwidth constrained links will choose smaller-than-1280-byte MTUs for their IPv4 interfaces if BCP 48, 50, and 71 recommendations are followed. Also, we cannot be assured that all forwarding nodes will correctly implement IPv4 fragmentation. So, we have a very real possibility for black holes here.

3.2.2. Dynamic Tunnel MTU

  The dynamic MTU determination is OPTIONAL.  However, if it is
  implemented, it SHOULD have the behavior described in this document.

  The fragmentation inside the tunnel can be reduced to a minimum by
  having the encapsulator track the IPv4 Path MTU across the tunnel,
  using the IPv4 Path MTU Discovery Protocol [RFC1191] and recording
  the resulting path MTU.  The IPv6 layer in the encapsulator can then
  view a tunnel as a link layer with an MTU equal to the IPv4 path MTU,
  minus the size of the encapsulating IPv4 header.

  Note that this does not eliminate IPv4 fragmentation in the case when
  the IPv4 path MTU would result in an IPv6 MTU less than 1280 bytes.
  (Any link layer used by IPv6 has to have an MTU of at least 1280
  bytes [RFC2460].)  In this case the IPv6 layer has to "see" a link
  layer with an MTU of 1280 bytes and the encapsulator has to use IPv4
  fragmentation in order to forward the 1280 byte IPv6 packets.


But, shouldn't the encapsulator send a "packet too big" to the source
in this case even if the MTU it reports is less than 1280 bytes? In response,
the source should then include a fragment header in the packets it sends
as a signal to the encapsulator that IPv4 fragmentation is permissible
(see RFC 2460, section 5). More discussion on this below:


  The encapsulator SHOULD employ the following algorithm to determine
  when to forward an IPv6 packet that is larger than the tunnel's path
  MTU using IPv4 fragmentation, and when to return an IPv6 ICMP "packet
  too big" message per [RFC1981]:

          if (IPv4 path MTU - 20) is less than 1280
                  if packet is larger than 1280 bytes
                          Send IPv6 ICMP "packet too big" with MTU = 1280.
                          Drop packet.
                  else
                          Encapsulate but do not set the Don't Fragment
                          flag in the IPv4 header.  The resulting IPv4
                          packet might be fragmented by the IPv4 layer on
                          the encapsulator or by some router along
                          the IPv4 path.
                  endif


I believe the above "else" case should be re-worded as follows:


       else
               if packet does not contain a fragment header
                       Send IPv6 ICMP "packet too big" with MTU
                       = (IPv4 path MTU - 20). Drop packet.
               else
                       Encapsulate and fragment the packet using IPv4
                       fragmentation with a maximum fragment size
                       of (IPv4 path MTU - 20). The lower 16 bits of
                       the Identification field in the fragment header
                       is used as the Identification field for each IPv4
                       fragment header, and the Don't Fragment field
                       is not set.
               endif
       endif

First, about sending the "packet too big" with an MTU size less
than 1280, this seems to me to be consistent with the expectation
specified in RFC 2460, section 5. This is what an "IPv6-to-IPv4
translator" is supposed to do, and from the perspective of the
original IPv6 host it makes no difference whether the node
that sends the packet too big is a translator or an IPv6-in-IPv4
tunnel endpoint.

As to fragmenting the packet in the enapsulator instead of
just sending it with the DF bit not set, the encapsulator has no
way of knowing whether there are forwarding nodes in the IPv4
path with broken, non-existent, or slow-path IPv4 fragmentation
implementations and so the only safe option is for the tunnel
encapsulator itself to do the fragmentation.

As to the setting of the fragment ID field, my suggested text
above reflects my best understanding of the normative ref's, but
I believe we have the following problem. What if the original IPv6
source wanted to do host-based IPv6 fragmentation (e.g., for large
UDP packets) even though the IPv6 path MTU was less than 1280
bytes?

The source would send a series of N IPv6 fragments, each of which
would have the same value in  the fragment ID field. But then, the
tunnel encapsulator would use IPv4 fragmentation to split each of the
N IPv6 fragments into M IPv4 fragments again using the *same*
fragment ID value! We would then have a collision in the decapsulator's
IPv4 reassembly buffer, since there would be no way of knowing to
which one of the N IPv6 fragments a particular IPv4 fragment belonged!

So, either my understanding of the normative references is wrong,
or the normative references themselves are wrong. Can you help?

Fred
ftemplin@iprg.nokia.com