[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

MTU stuff, was Re: [RRG] LISP-NERD reachability and MTU detection

To: "Templin, Fred L" <Fred.L.Templin@boeing.com>, Dino Farinacci <dino@cisco.com>
Subject: MTU stuff, was Re: [RRG] LISP-NERD reachability and MTU detection
From: Iljitsch van Beijnum <iljitsch@muada.com>
Date: Tue, 18 Dec 2007 13:13:32 +0100
Cc: Routing Research Group list <rrg@psg.com>
In-reply-to: <39C363776A4E8C4A94691D2BD9D1C9A1029EDD10@XCH-NW-7V2.nw.nos.boeing.com>
References: <EAB3BF96-D438-459E-A753-F9D72B1FE5B6@muada.com> <EC1BB972-6F21-4CBC-B827-BB1840C25AE8@cisco.com> <99E69FED-D637-4F47-985E-6AB0DCB0B8E0@cisco.com> <54ADF2DD-8FFC-45BD-9C56-BD548A91528B@cisco.com> <39C363776A4E8C4A94691D2BD9D1C9A1029EDD10@XCH-NW-7V2.nw.nos.boeing.com>

On 17 dec 2007, at 20:27, Templin, Fred L wrote:

Key considerations are: 1) 1500 bytes has become the
"magic number" expected by applications

Applications??

2) 1280 bytes is
the "magic number" specified for IPv6, and 3) fragmentation
at the TFE MUST be kept to a minimum in order to avoid
reassembly misassociations at the TFE. Of these, IMHO 3) is
the dominating consideration followed distantly by 1). ( 2)
is the hard lower bound for IPv6, and we can't change that.)

In particular, I want to see a requirement that TNEs MUST NOT
configure a fragmentation threshold larger than 1500 bytes
for the packets they admit into the tunnel.

I don't think the (main) problem is packets larger than 1500 bytes. Ifyou generate those, you pretty much know what you're doing (or soonwill). The issue is when tunnel overhead over a 1500-byte path breaksthe 1500-byte assumption that is created by the fact that peoplefilter ICMP too big messages without bothering to disable path MTUdiscovery.

Specific transitions I would like to see include:

 1) Require that all TFEs configure an EMTU_R that is no
    smaller than 2KB and at least as large as the smallest
    EMTU_R of all underlying links over which the TFE is
    configured. (IMHO 2KB is a good number because it
    allows for a 1500 byte fragmentation threshold at the
    TNE yet allows room for additional encapsulations
    on the path.)

If the reassembly happens in the destination host this shouldn't be anissue in practice because of the TCP MSS option, if it happens in amiddlebox we can mandate a number, and 2048 seems like a conservativeone, or we can specify a way for the destination to let the sourceknow what the number is.

 2) Require that all links transition to adopting IEEE
    802.3as Ethernet Frame Size expansion, or better yet
    Gigabit Ethernet Jumboframes.

There is already a large amount of equipment out there that does "babyjumbos" which should be enough to allow encapsulation of a 1500 bytepacket without problems, but there's also still a lot of 100 Mbps andsome 1 Gbps equipment out there that can only do 1500 or 1504. Ibelieve that a new effort like this allows us to require people toupgrade their MTUs, something that's pretty much impossible to do atany other time, so I would be in favor of doing so.

 3) Require that all original sources that send packets
    of 1501 bytes or larger with DF=1 also implement
    RFC4821.

Not really an issue, in my opinion. If you send large packets youeither need to implement RFC 4821 or you need to make sure that youhit a 1500-byte hop that reliably sends you too bigs before you enterthe big bad internet. If either of these are impossible (and assumingTCP MSS clamping isn't an option) you can't realistically have an MTUlarger than 1500 bytes.

On 18 dec 2007, at 2:36, Templin, Fred L wrote:

Adding a means for the ITR to discover the ETR's EMTU_R
is something I have proposed in numerous earlier efforts,
and also something I have considered for sprite-mtu. But
AFAICT, we really don't want the ETR to be reassembling
fragmented outer packets any larger than 1500 bytes;
instead, the ITR should send packets larger than 1500
bytes in one piece and/or send back a PTB if they are
too big.

Fair enough.

However, encoding a specific packet size that triggers differentbehavior makes me uncomfortable.

So, IMHO all that needs to be known about the ETR is the
binary as to whether it can reassemble up to 1500 bytes
or not. If we say that all ETR's must be able to
reassemble up to 2KB (enough to cover the 1500 byte
packet plus any additional encapsulation overhead)
then maybe there isn't all that much to be gained by
an explicit EMTU_R discovery exchange?

Well, if you don't want to reassemble the EMTU_R would be moot, andpretty much also if you only want to reassemble packets that hoveraround the magic 1500-byte mark because obviously any real-worlddevice that's going to be created will be able to support that size ifit supports reassembly in the first place. Still, mentioning aspecific size, such as 2048, in that case would probably be useful.

On 18 dec 2007, at 0:01, Dino Farinacci wrote:

I am not advocating that the ETR reassemble here. I want to makethat clear.

Ok. That is a reasonable position.

You can't fragment IPv6 packets or IPv4 packets with DF=1.

Right, you have to obey the protocol spec. So packets will getdropped with DF=1. And people turn off ICMP messages as well.

In my opinion, building devices that can't forward 1500-byte packetswithout fragmentation and deploying them in ISP networks is a non-starter*. You ruled out reassembly by ETRs so this means that weeither have to compress the encapsulation overhead to 0 bytes (=translation) or we have to require larger MTUs in the entire pathbetween any ITR and any ETR.

* You could have ITRs that can't handle 1500 bytes if those are underthe control of the source site because then the source site can makesure that the too bigs the ITR generates are acted upon. But if thereare _some_ ITRs that need to send 1500+ byte packets then _all_ ETRsmust support this, too.

So what's the difference if packets get lost doing a mapping lookup(everyone is so sensitive to packet drops there) but for MTUdiscovery purposes it's okay to drop packets?

Depends on how many packets get dropped. But the fundamentaldifference is that between dropping the first packet or a later one.With the first packet, TCP doesn't know if the other side is reachableand it doesn't have an RTT estimate yet, so recovering from that is alot slower. Also, if PMTUD is properly deployed, the packet that wastoo big will be immediately resent after receiving the too big message.

Do you think 1500 byte MTU links will still be around say 5 yearsfrom now? Maybe it's time to clean up some links on the network. I'msure vendors can provide incentive to do this. ;-)

Well, you work for a vendor. You guys ship tons of product that canhandle 1500+ byte MTUs (and some that can't) but AFAIK, in each andevery case, ethernet interfaces on routers have their MTU set to 1500by default.

I did get some good feedback when I presented my variable MTU subnetdraft in Chicago but not much after that. I'm going to see if I canget it published as an experimental RFC anyway. Hopefully, that way wereally can get rid of those 1500-byte MTUs in the next five years.(But I'm not holding my breath.)

We have both the potential to do very quite things (trigger brokenPMTUD)

I was going for "quite harmful"

and very useful things (give people an incentive to deployjumboframes, create the first MTU-robust tunneling mechanism) hereso we should aim to get things right the first time rather thanrepeat the mistakes made with RFC 1191.

When you think it is right, it will change. It's been a continualmoving target with multiple moving parts for 20 years. You can neverbe right.

Maybe you can't ever be right, but that doesn't mean you can't be morewrong than usual. :-)

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg

Follow-Ups:
- RE: MTU stuff, was Re: [RRG] LISP-NERD reachability and MTU detection
  - From: "Templin, Fred L" <Fred.L.Templin@boeing.com>

References:
- [RRG] LISP-NERD reachability and MTU detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- Re: [RRG] LISP-NERD reachability and MTU detection
  - From: Dino Farinacci <dino@cisco.com>
- Re: [RRG] LISP-NERD reachability and MTU detection
  - From: Tony Li <tli@cisco.com>
- Re: [RRG] LISP-NERD reachability and MTU detection
  - From: Dino Farinacci <dino@cisco.com>
- RE: [RRG] LISP-NERD reachability and MTU detection
  - From: "Templin, Fred L" <Fred.L.Templin@boeing.com>

Prev by Date: Re: [RRG] LISP-NERD reachability and MTU detection
Next by Date: RE: MTU stuff, was Re: [RRG] LISP-NERD reachability and MTU detection
Previous by thread: RE: [RRG] LISP-NERD reachability and MTU detection
Next by thread: RE: MTU stuff, was Re: [RRG] LISP-NERD reachability and MTU detection
Index(es):
- Date
- Thread