[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

MTU stuff, was Re: [RRG] LISP-NERD reachability and MTU detection



On 17 dec 2007, at 20:27, Templin, Fred L wrote:

Key considerations are: 1) 1500 bytes has become the
"magic number" expected by applications

Applications??

2) 1280 bytes is
the "magic number" specified for IPv6, and 3) fragmentation
at the TFE MUST be kept to a minimum in order to avoid
reassembly misassociations at the TFE. Of these, IMHO 3) is
the dominating consideration followed distantly by 1). ( 2)
is the hard lower bound for IPv6, and we can't change that.)

In particular, I want to see a requirement that TNEs MUST NOT
configure a fragmentation threshold larger than 1500 bytes
for the packets they admit into the tunnel.

I don't think the (main) problem is packets larger than 1500 bytes. If you generate those, you pretty much know what you're doing (or soon will). The issue is when tunnel overhead over a 1500-byte path breaks the 1500-byte assumption that is created by the fact that people filter ICMP too big messages without bothering to disable path MTU discovery.

Specific transitions I would like to see include:

 1) Require that all TFEs configure an EMTU_R that is no
    smaller than 2KB and at least as large as the smallest
    EMTU_R of all underlying links over which the TFE is
    configured. (IMHO 2KB is a good number because it
    allows for a 1500 byte fragmentation threshold at the
    TNE yet allows room for additional encapsulations
    on the path.)

If the reassembly happens in the destination host this shouldn't be an issue in practice because of the TCP MSS option, if it happens in a middlebox we can mandate a number, and 2048 seems like a conservative one, or we can specify a way for the destination to let the source know what the number is.

 2) Require that all links transition to adopting IEEE
    802.3as Ethernet Frame Size expansion, or better yet
    Gigabit Ethernet Jumboframes.

There is already a large amount of equipment out there that does "baby jumbos" which should be enough to allow encapsulation of a 1500 byte packet without problems, but there's also still a lot of 100 Mbps and some 1 Gbps equipment out there that can only do 1500 or 1504. I believe that a new effort like this allows us to require people to upgrade their MTUs, something that's pretty much impossible to do at any other time, so I would be in favor of doing so.

 3) Require that all original sources that send packets
    of 1501 bytes or larger with DF=1 also implement
    RFC4821.

Not really an issue, in my opinion. If you send large packets you either need to implement RFC 4821 or you need to make sure that you hit a 1500-byte hop that reliably sends you too bigs before you enter the big bad internet. If either of these are impossible (and assuming TCP MSS clamping isn't an option) you can't realistically have an MTU larger than 1500 bytes.

On 18 dec 2007, at 2:36, Templin, Fred L wrote:

Adding a means for the ITR to discover the ETR's EMTU_R
is something I have proposed in numerous earlier efforts,
and also something I have considered for sprite-mtu. But
AFAICT, we really don't want the ETR to be reassembling
fragmented outer packets any larger than 1500 bytes;
instead, the ITR should send packets larger than 1500
bytes in one piece and/or send back a PTB if they are
too big.

Fair enough.

However, encoding a specific packet size that triggers different behavior makes me uncomfortable.

So, IMHO all that needs to be known about the ETR is the
binary as to whether it can reassemble up to 1500 bytes
or not. If we say that all ETR's must be able to
reassemble up to 2KB (enough to cover the 1500 byte
packet plus any additional encapsulation overhead)
then maybe there isn't all that much to be gained by
an explicit EMTU_R discovery exchange?

Well, if you don't want to reassemble the EMTU_R would be moot, and pretty much also if you only want to reassemble packets that hover around the magic 1500-byte mark because obviously any real-world device that's going to be created will be able to support that size if it supports reassembly in the first place. Still, mentioning a specific size, such as 2048, in that case would probably be useful.

On 18 dec 2007, at 0:01, Dino Farinacci wrote:

I am not advocating that the ETR reassemble here. I want to make that clear.

Ok. That is a reasonable position.

You can't fragment IPv6 packets or IPv4 packets with DF=1.

Right, you have to obey the protocol spec. So packets will get dropped with DF=1. And people turn off ICMP messages as well.

In my opinion, building devices that can't forward 1500-byte packets without fragmentation and deploying them in ISP networks is a non- starter*. You ruled out reassembly by ETRs so this means that we either have to compress the encapsulation overhead to 0 bytes (= translation) or we have to require larger MTUs in the entire path between any ITR and any ETR.

* You could have ITRs that can't handle 1500 bytes if those are under the control of the source site because then the source site can make sure that the too bigs the ITR generates are acted upon. But if there are _some_ ITRs that need to send 1500+ byte packets then _all_ ETRs must support this, too.

So what's the difference if packets get lost doing a mapping lookup (everyone is so sensitive to packet drops there) but for MTU discovery purposes it's okay to drop packets?

Depends on how many packets get dropped. But the fundamental difference is that between dropping the first packet or a later one. With the first packet, TCP doesn't know if the other side is reachable and it doesn't have an RTT estimate yet, so recovering from that is a lot slower. Also, if PMTUD is properly deployed, the packet that was too big will be immediately resent after receiving the too big message.

Do you think 1500 byte MTU links will still be around say 5 years from now? Maybe it's time to clean up some links on the network. I'm sure vendors can provide incentive to do this. ;-)

Well, you work for a vendor. You guys ship tons of product that can handle 1500+ byte MTUs (and some that can't) but AFAIK, in each and every case, ethernet interfaces on routers have their MTU set to 1500 by default.

I did get some good feedback when I presented my variable MTU subnet draft in Chicago but not much after that. I'm going to see if I can get it published as an experimental RFC anyway. Hopefully, that way we really can get rid of those 1500-byte MTUs in the next five years. (But I'm not holding my breath.)

We have both the potential to do very quite things (trigger broken PMTUD)

I was going for "quite harmful"

and very useful things (give people an incentive to deploy jumboframes, create the first MTU-robust tunneling mechanism) here so we should aim to get things right the first time rather than repeat the mistakes made with RFC 1191.

When you think it is right, it will change. It's been a continual moving target with multiple moving parts for 20 years. You can never be right.

Maybe you can't ever be right, but that doesn't mean you can't be more wrong than usual. :-)

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg