[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
MTU stuff, was Re: [RRG] LISP-NERD reachability and MTU detection
On 17 dec 2007, at 20:27, Templin, Fred L wrote:
Key considerations are: 1) 1500 bytes has become the
"magic number" expected by applications
Applications??
2) 1280 bytes is
the "magic number" specified for IPv6, and 3) fragmentation
at the TFE MUST be kept to a minimum in order to avoid
reassembly misassociations at the TFE. Of these, IMHO 3) is
the dominating consideration followed distantly by 1). ( 2)
is the hard lower bound for IPv6, and we can't change that.)
In particular, I want to see a requirement that TNEs MUST NOT
configure a fragmentation threshold larger than 1500 bytes
for the packets they admit into the tunnel.
I don't think the (main) problem is packets larger than 1500 bytes. If
you generate those, you pretty much know what you're doing (or soon
will). The issue is when tunnel overhead over a 1500-byte path breaks
the 1500-byte assumption that is created by the fact that people
filter ICMP too big messages without bothering to disable path MTU
discovery.
Specific transitions I would like to see include:
1) Require that all TFEs configure an EMTU_R that is no
smaller than 2KB and at least as large as the smallest
EMTU_R of all underlying links over which the TFE is
configured. (IMHO 2KB is a good number because it
allows for a 1500 byte fragmentation threshold at the
TNE yet allows room for additional encapsulations
on the path.)
If the reassembly happens in the destination host this shouldn't be an
issue in practice because of the TCP MSS option, if it happens in a
middlebox we can mandate a number, and 2048 seems like a conservative
one, or we can specify a way for the destination to let the source
know what the number is.
2) Require that all links transition to adopting IEEE
802.3as Ethernet Frame Size expansion, or better yet
Gigabit Ethernet Jumboframes.
There is already a large amount of equipment out there that does "baby
jumbos" which should be enough to allow encapsulation of a 1500 byte
packet without problems, but there's also still a lot of 100 Mbps and
some 1 Gbps equipment out there that can only do 1500 or 1504. I
believe that a new effort like this allows us to require people to
upgrade their MTUs, something that's pretty much impossible to do at
any other time, so I would be in favor of doing so.
3) Require that all original sources that send packets
of 1501 bytes or larger with DF=1 also implement
RFC4821.
Not really an issue, in my opinion. If you send large packets you
either need to implement RFC 4821 or you need to make sure that you
hit a 1500-byte hop that reliably sends you too bigs before you enter
the big bad internet. If either of these are impossible (and assuming
TCP MSS clamping isn't an option) you can't realistically have an MTU
larger than 1500 bytes.
On 18 dec 2007, at 2:36, Templin, Fred L wrote:
Adding a means for the ITR to discover the ETR's EMTU_R
is something I have proposed in numerous earlier efforts,
and also something I have considered for sprite-mtu. But
AFAICT, we really don't want the ETR to be reassembling
fragmented outer packets any larger than 1500 bytes;
instead, the ITR should send packets larger than 1500
bytes in one piece and/or send back a PTB if they are
too big.
Fair enough.
However, encoding a specific packet size that triggers different
behavior makes me uncomfortable.
So, IMHO all that needs to be known about the ETR is the
binary as to whether it can reassemble up to 1500 bytes
or not. If we say that all ETR's must be able to
reassemble up to 2KB (enough to cover the 1500 byte
packet plus any additional encapsulation overhead)
then maybe there isn't all that much to be gained by
an explicit EMTU_R discovery exchange?
Well, if you don't want to reassemble the EMTU_R would be moot, and
pretty much also if you only want to reassemble packets that hover
around the magic 1500-byte mark because obviously any real-world
device that's going to be created will be able to support that size if
it supports reassembly in the first place. Still, mentioning a
specific size, such as 2048, in that case would probably be useful.
On 18 dec 2007, at 0:01, Dino Farinacci wrote:
I am not advocating that the ETR reassemble here. I want to make
that clear.
Ok. That is a reasonable position.
You can't fragment IPv6 packets or IPv4 packets with DF=1.
Right, you have to obey the protocol spec. So packets will get
dropped with DF=1. And people turn off ICMP messages as well.
In my opinion, building devices that can't forward 1500-byte packets
without fragmentation and deploying them in ISP networks is a non-
starter*. You ruled out reassembly by ETRs so this means that we
either have to compress the encapsulation overhead to 0 bytes (=
translation) or we have to require larger MTUs in the entire path
between any ITR and any ETR.
* You could have ITRs that can't handle 1500 bytes if those are under
the control of the source site because then the source site can make
sure that the too bigs the ITR generates are acted upon. But if there
are _some_ ITRs that need to send 1500+ byte packets then _all_ ETRs
must support this, too.
So what's the difference if packets get lost doing a mapping lookup
(everyone is so sensitive to packet drops there) but for MTU
discovery purposes it's okay to drop packets?
Depends on how many packets get dropped. But the fundamental
difference is that between dropping the first packet or a later one.
With the first packet, TCP doesn't know if the other side is reachable
and it doesn't have an RTT estimate yet, so recovering from that is a
lot slower. Also, if PMTUD is properly deployed, the packet that was
too big will be immediately resent after receiving the too big message.
Do you think 1500 byte MTU links will still be around say 5 years
from now? Maybe it's time to clean up some links on the network. I'm
sure vendors can provide incentive to do this. ;-)
Well, you work for a vendor. You guys ship tons of product that can
handle 1500+ byte MTUs (and some that can't) but AFAIK, in each and
every case, ethernet interfaces on routers have their MTU set to 1500
by default.
I did get some good feedback when I presented my variable MTU subnet
draft in Chicago but not much after that. I'm going to see if I can
get it published as an experimental RFC anyway. Hopefully, that way we
really can get rid of those 1500-byte MTUs in the next five years.
(But I'm not holding my breath.)
We have both the potential to do very quite things (trigger broken
PMTUD)
I was going for "quite harmful"
and very useful things (give people an incentive to deploy
jumboframes, create the first MTU-robust tunneling mechanism) here
so we should aim to get things right the first time rather than
repeat the mistakes made with RFC 1191.
When you think it is right, it will change. It's been a continual
moving target with multiple moving parts for 20 years. You can never
be right.
Maybe you can't ever be right, but that doesn't mean you can't be more
wrong than usual. :-)
--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg