[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mech-v2: static MTU scope; feedback sought



Thanks for your feedback, Fred -- it helped a lot in clarifying my 
thinking.  Inline --

On Thu, 29 Jan 2004, Fred Templin wrote:
> >Now, I'd like to move on here.  I personally don't see much of a
> >problem with mismatching MRU/MTU values as long as they are below
> >1500.  But I'd like to see whether there are any evidence indicating
> >otherwise.
> 
> I agree that it is OK to have the MRU slightly larger than the MTU.
[...]

True.  The scenario we must analyze a bit, however, is when someone in
the path has a lower IPv6 MTU.  Like:

 host <=================> router <---------->   destination
       v6-in-v4 tunnel            v6 link
       v6 MTU = 1380            v6 MTU = 1280
                                                                                                                           
Router sends ICMPv6 packet too big to the sending host, connected over
a tunnel.
                                                                                                                           
What happens when the host receives this ICMPv6 Packet too Big 
message?

 1) If the node does not implement PMTUD, the all the packets will be 
   discarded -- an unresolvable error condition: meaning, unless you 
   do PMTUD, you must never send packets over MTU=1280.
 2) If the node implements PMTUD,
   a) decrease the effective MTU of the IPv6 link (e.g., by adding
      a "cloned" route with lower MTU to the routing table
      (or whatever the implementation is),
   b) if this fails (e.g., UDP), start sending IPv6 fragments.

However, note from the below: ICMPv6 spec states that implementation
MUST pass Pkt Too Big up to the upper layers.  This would imply that a
simple form of 2) would have to be done without PMTUD as well, but
that this should not typically happen as the nodes without PMTUD must
restrict the size of packets they send to 1280 bytes, generating no
such messages.

> > - MUST a static tunnel MTU case also implement an algorithm to
> >   react to the received ICMPv6 Packet too Big messages, e.g. to lower 
> >   the MTU?  (or is this really a subset of dynamic tunnel MTU).  
> >   RFC2463 sect 3.2 seems to give an impression that this would have
> >   to be done, but not sure.
> 
> Well, the text you are citing says only that "An incoming Packet Too Big
> messaage MUST be passed to the upper-layer process"; I'm not really seeing
> anything about changing the tunnel MTU based on the ICMPv6 messages.

Yes, in the highsight -- that's true.  Note that this is independent 
of full PMTUD, so even if a node didn't implement PMTUD, it would have 
to implement the processing of Packet Too Big messages, which should 
reflect e.g., what MSS TCP uses, and maybe even fragment UDP packets 
if they are too big.

It seems the picture is not consistent from the IPv6 implementation 
point-of-view.  Perhaps this should be brought up there..

> My concern here is for nodes with long, thin (i.e., slow) physical links.
> Suppose a tunnel interface is configured over a physical interface that
> takes the recommendations of BCP 48 and configures a small IPv4 MTU,
> e.g., 296 bytes. The tunnel interface will still have to configure at least
> the minimum IPv6 MTU of 1280 bytes, but the 296 byte IPv4 MTU of
> the underlying physical interface will be opaque to the upper layers.
> In this case, two possibilities exist when the tunnel interface sends
> encapsulated packets larger than 296 bytes:
> 
>   1) The tunnel interface sends encapsulated packets with the DF bit
>     NOT set in the IPv4 header, and the IPv4 stack performs intentional
>     IPv4 fragmentation to a maximum fragment size of 296 bytes.
> 
>   2) The tunnel interface sends encapsulated packets with the DF bit
>     SET, and the IPv4 stack sends back a locally-generated ICMPv4
>     "fragmentation needed" message with MTU = 296.

I think these cases aren't really all that different as long as they 
are done inside the node.  I don't think an implementation would send 
"internal" frag needed messages, but just do the fragmentation, 
ultimately resulting in 1) whether going through 2) or not.
 
> On the
> other hand, I have heard that there are devices such as security gateways,
> NATs, etc. that drop incoming IPv4 fragments (perhaps only sending
> the first-fragment on to the final destination), i.e., black-holes may
> result. What are the operational experiences with this?

I can't say anything about that, this hasn't come up that much in our 
scenarios.

> My thoughts on 2) are that the locally-generated ICMPv4 "fragment
> needed" messages will be passed to the configured tunnel device driver
> in the kernel, but what actions should be taken? Should the driver:
> 
>   a) reduce the configured tunnel's MTU to 296-20 (i.e., less than the
>     IPv6 minimum MTU)?
>   b) keep the MTU in, e.g., a neighbor cache entry to provide a maximum
>     fragment size for IPv6 fragmentation?
>   c) translate the ICMPv4 "frag needed" into an ICMPv6 "packet too big"
>     and pass it up to the application?
>   d) other?
> 
> I can imagine either a) or b) as acceptable answers. I have a hard time
> believing c) could be good, because it might cause TCPs to configure
> a too-small MSS. But, I'd like to hear operational experiences and/or
> other alternatives I may be missing.

As the link-layer MTU is too low to accommodate IPv6 minimum MTU, c) 
is out of quetion in this instance.  a) is also out of question, if 
you are talking about reducing v6 MTU below the minimum, as that would 
break a lot of assumptions small IPv6 devices have about the minimal 
path MTUs.  So, I guess b) is the only option?

Note that in this case, there does not have to be IPv6 fragmentation 
at all.  IPv6 link should still be 1280 bytes even though the 
underlying v4 link is only e.g. 296 bytes.  I don't think there is 
anything in the v6 spec that forbids the host from doing "voluntary 
fragmentation" even though it could send larger packets to accommodate 
for the slower links, but that's just a local optimization I think.
> 
> > - do you think it would be OK to recommend dynamic MTU instead in the 
> >   double encapsulation scenario described by Dave?
> >
> 
> Possibly; the only concern is trust issues for ICMPv4 "fragmentation
> needed" messages generated by middleboxes. Perhaps if the ICMPv4s
> were only accepted from the localhost and from the far end of the
> tunnel it would be OK?

These trust issues (whether one thinks they're really a problem or not
is another thing) exist regardless of this specification, so I'm not
sure whether we should work too hard to replacing the functionality of 
ICMPv4 frag needed messages.

-- 
Pekka Savola                 "You each name yourselves king, yet the
Netcore Oy                    kingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings