[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] MTU, jumboframes, ITR & ETR placement, ITR function in hosts

To: Routing Research Group list <rrg@psg.com>
Subject: Re: [RRG] MTU, jumboframes, ITR & ETR placement, ITR function in hosts
From: Robin Whittle <rw@firstpr.com.au>
Date: Mon, 26 Nov 2007 01:08:29 +1100
Cc: Iljitsch van Beijnum <iljitsch@muada.com>, Gert Doering <gert@space.net>
In-reply-to: <C7F0E89F-7CBC-4B0E-B021-532C2F90AA25@muada.com>
Organization: First Principles
References: <4747933B.2090009@firstpr.com.au> <C7F0E89F-7CBC-4B0E-B021-532C2F90AA25@muada.com>
User-agent: Thunderbird 2.0.0.9 (Windows/20071031)

Hi again Iljitsch!

Our correspondence continues:

>> I would be delighted if you can show me it is practical to
>> insist that ITRs and ETRs be installed only in places with
>> significantly greater than 1500 byte MTU to the rest of the
>> Net.  As far as I know, it is impractical, due to the
>> widespread use of 100Mbps links in many places.
> 
> The practicality of that insistance isn't derived from wide
> availability of 1500+ byte MTUs in provider networks (although
> Dino claims this isn't an issue based on a survey he did) but
> rather from the trouble we can expect to see if we reduce the MTU
> of links across the core of the internet to below 1500 bytes.

This sounds like an argument between the immovable object and the
irresistible force.

Something doesn't become practical just because the only apparent
alternative seems unthinkable.

You later contemplate extending this to those end-user networks
which contain ITRs.  In practice, as the ITR-ETR scheme becomes
ubiquitous, this would be virtually all end-user networks.


The correspondence I quoted, from people who know much more about
this than me, convinced me that your desirable idea of upgrading to
MTU >> 1500 is not practical.

Perhaps it can be done, but I want ITRs and ETRs to be placed pretty
much everywhere, so I doubt this will be practical, since so many
edge networks run on 100Mbps Ethernet, which I understand typically
provides a Layer 3 MTU of 1500 bytes.


>> The quotes below from the RAM discussion on this, especially
>> those from Gert Doering, seem to indicate that at present,
>> there is sufficiently widespread use of 100Mbps Fast Ethernet,
>> to drag down the MTU of many Internet exchange networks to
>> 1500.
> 
> Obviously smaller ISPs will have 100 Mbps links in some places.
> The real question is: do the *TRs need to be behind those 100
> Mbps links, or can they be placed in a more central part of the
> network where the requirement of a 1500+ byte MTU can reasonably
> be met?
> 
> You seem to want *TRs pretty much everywhere. Although I think
> there are security issues with that, I don't reject having them
> even in end-user networks out of hand, but that doesn't mean I
> accept the lowest common denominator in those networks. If they
> want to run an encapsulation/decapsulation device, they'll just
> have to upgrade their network to support the MTU that makes this
> possible.


My proposal and Fred's is based on the assumption that we can't
avoid various things chopping the MTU below 1500 - so we proposed
something to cope with this which we think is reasonably (or to you
alarmingly) troublesome, but we still think it is practical and
better than the alternatives.

I suggest that Sprite and IPTM (or something like it) be developed
as best as possible.

Then, the costs and benefits of "Sprite as implemented for
ITR-ETR schemes" or IPTM can be compared with the costs and benefits
of what you propose.


>> I don't discount what you are saying.  Maybe by the time an
>> ITR-ETR scheme is introduced, it will be practical to insist on
>> a minimum standard well above 1500.  I hope some other people
>> can contribute to this discussion.
> 
> I don't expect the situation in this area to change
> significantly. Today, pretty much all 100 Mbps or slower stuff
> doesn't support jumboframes, while pretty much all 1000 Mbps or
> faster stuff does. (Sometimes this even goes for the 10/100 and
> 1000 ports on the same switch.) Expensive 100 Mbps equipment
> doesn't exist anymore, so I don't see a push for larger MTUs
> there.

I look forward to other people debating this.


>> However, while it would be possible to configure an ITR in some
>> way to tell it that it has a >>1500 byte MTU to the "core of
>> the Net", this doesn't help much, since it can't know for sure
>> that every ETR it needs to send packets to has a similarly high
>> MTU.
> 
> This could be learned through the mapping service.

ITR-ETR schemes other than Ivip have complex mapping systems and
their designers might contemplate adding something else to the
mapping data.  Ivip's system is very simple - just an ETR address
for a given micronet.  I don't want to add anything more than that.

If some ITRs want to participate in some collaborative system to
develop a cheatsheet of good PMTU values to try first, or to be
happy with as a maximum without pushing higher, that would be fine.
Perhaps that could be part of the whole ITR-ETR scheme.

It might work at the level of BGP advertised prefixes, rather than
specific ETRs.  Maybe the suggested PMTU value is the same for all
ETRs in that entire prefix.

The PMTU value doesn't need to be tied to every item of mapping
data, since, for instance, hundreds of micronets may currently be
mapped to one ETR.  There is only one "cheatsheet" PMTU value for
each ETR.

But this "cheatsheet" suggested value is of limited value, I think.
 There could be all sorts of PMTU limitations at the ITR end of the
tunnel, the ETR end or anywhere in the middle.  These may vary so
much (in the short term, as paths take the packets through different
routers with potential tunnels of their own?) that it is not
desirable to suggest that some single figure can be reliably used as
a default PMTU, or as a target to which a PMTUD system might first
try and be happy with.



>> Also, I think there are many benefits in having a caching ITR
>> in the sending host, as I discuss below.
> 
> In that case there are no problems, because obviously it's
> allowed to send packets smaller than the minimum maximum and
> PMTUD black holes aren't possible here because everything happens
> on the source host.

I haven't thought exactly how an ITFH might be implemented.  It
could be some additional functional block, inform the existing code
with the equivalent of a PTB message just as an external ITR would.
 Alternatively, it could be very tightly integrated with the code
which handles packetisation.

Previously, I recognised that an ITFH only makes sense in a given
location if it can quickly and reliably communicate with a nearby
Query Server.  However, this PMTU stuff involves the ITFH sending
hefty probe packets, which involves occasional, but perhaps
frequent, significant bursts of bandwidth.  This would typically be
more outgoing traffic than the queries.  Since DSL and cable modem
links have lousy upstream paths, this is a further reason not to put
an ITFH at the user end of such a link.  (I include in "ITFH" the
idea that the modem's NAT system perform the ITR functions on its
outgoing packets.)

I think an ITFH still makes a lot of sense to me for web servers in
a hosting farm, assuming they are not running flat out, to save the
hosting company's ITRs from handling most of the load.


>> While we don't run 1500 on links where MPLS is used, most Cisco
>>  gear in use cannot go over 1520...1530 on FastEthernet ports. 
>> This limits the GigE machines to 1530 as well, as things really
>>  break (today) if you share a layer2 segment between machines 
>> with different MTUs.
> 
> Yeah, someone should do something about that. Oh wait:
> 
> http://www.ietf.org/internet-drafts/draft-van-beijnum-multi-mtu-01.txt

Sorry, I didn't look in detail at this, because it seems to be only
for IPv6.


>> At the DECIX, currently the 3rd biggest exchange in Europe, 
>> about one third of the members (all sharing a common L2 
>> network!) are still connected with 100 Mbit/s.
> 
>> Which means "1500 for all of them".
> 
> Yes, this is an issue.

OK.

>> All exchange points that we're connected to run the fabric at 
>> 1500 byte MTU, because they have members that have equipment 
>> that cannot handle more.  There are *some* IXPs that have two 
>> different LANs, one with 1500 and one with "Jumbo", but that's 
>> not very widespread yet.
> 
> That's because there is little value in configuring a larger  MTU
> today because in 99% of all paths through the internet there is
> at least one 1500-byte ethernet hop, so you're pretty much never
> going to see actual data packets flowing between end-users that
> are bigger than 1500 bytes.

This sounds familiar . . .  everyone (or almost everyone) has to do
an expensive upgrade before anyone derives any benefit.

We really need a Fairy Godmother to cast a spell which brings
forward some things in time (the benefits) so that the mechanisms
which provide the benefits can be created in the first place!


> I'm pretty sure if LISP or something like it is deployed by the
> big guys in the US (which do all their interconnecting through
> private links AFAIK) the Europeans who use internet exchanges
> will spit and curse but make the necessary changes to the
> exchange setups after that. The alternative is endless
> handholding of customers with PMTUD problems rather than a
> one-time infrastructure change.

I think we need to design a one-time infrastructure change resulting
in some new, flexible, address space which has certain limitations
which end-users can live with - but which does not suck, either
continually or in a flaky PMTUitis kind of way.

If some administrative arrangements could be used to cause all
relevant gear to be upgraded to gigabit Ethernet, I would be
delighted to forget about IPTM.  However, I think that would be a
cause of further delay for deploying Ivip - or whatever ITR-ETR
scheme is chosen - and/or for seriously restricting where ITRs and
ETRs could be located.

 - Robin


--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg

Follow-Ups:
- Re: [RRG] MTU, jumboframes, ITR & ETR placement, ITR function in hosts
  - From: Iljitsch van Beijnum <iljitsch@muada.com>

References:
- [RRG] MTU, jumboframes, ITR & ETR placement, ITR function in hosts
  - From: Robin Whittle <rw@firstpr.com.au>
- Re: [RRG] MTU, jumboframes, ITR & ETR placement, ITR function in hosts
  - From: Iljitsch van Beijnum <iljitsch@muada.com>

Prev by Date: Re: [RRG] PMTUD, Sprite & IPTM; Outer src-addr = sending host's addr
Next by Date: [RRG] re: A new draft about Hierarchical Routing Architecture
Previous by thread: Re: [RRG] MTU, jumboframes, ITR & ETR placement, ITR function in hosts
Next by thread: Re: [RRG] MTU, jumboframes, ITR & ETR placement, ITR function in hosts
Index(es):
- Date
- Thread