[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RRG] Re: LISP PMTU & fragmentation problems

To: Routing Research Group <rrg@psg.com>
Subject: [RRG] Re: LISP PMTU & fragmentation problems
From: Robin Whittle <rw@firstpr.com.au>
Date: Mon, 10 Mar 2008 14:30:41 +1100
Cc: Dino Farinacci <dino@cisco.com>, "Templin, Fred L" <Fred.L.Templin@boeing.com>
In-reply-to: <7EC200C2-CCE0-4E12-8E1F-76CD52BA6240@cisco.com>
Organization: First Principles
References: <47D16205.1080406@firstpr.com.au> <7EC200C2-CCE0-4E12-8E1F-76CD52BA6240@cisco.com>
User-agent: Thunderbird 2.0.0.12 (Windows/20080213)
Hi Dino,

You wrote:

>> Short version:    lisp-06's new material on Path MTU limits includes
>>             some text on how to resolve the problems if they
>>             are deemed to be bad enough to need a solution
>>             within LISP.  I can't understand this text in any
>>             way which makes practical sense.
> 
> Well, I'm sorry about that. The text is written in such simple and
> precise terms, I am surprised you wouldn't understand it.

I think I understood it OK, though it would be better if you put the
1500 byte recommendation at the beginning, and broke it into two
sections: IPv4 DF=0 first and IPv6 or IPv4 DF=1 second.

What I understood from the ID seemed so at odds with a desirable
outcome that I assumed you were mistaken in some way, writing text
which conveyed something different from what you really intended.


>>             Also, it would be good to publish the research
>>             which indicates that 1500 byte MTU limits are
>>             relatively rare.
> 
> There wasn't research. It was a survey. I asked 10 people, and all 10
> made this statement. So there isn't much to publish.

Since the PMTU/fragmentation problem needs to be tackled by any
mature map-encap scheme, and since your intention is to ignore the
problem if you think some high proportion of links in the DFZ
("transit paths" was the term you used in section 5.) support
jumboframes, I think it would help folks evaluating LISP if you had
more substantial research.

For instance, somehow get a bunch of border routers in a bunch of
ISPs and AS end-user networks and find out what the PMTU is from
each such router to all the rest.  Then, somehow characterise how
representative the sample is, so that the results can be
extrapolated in some meaningful way to all existing ISP and AS
end-user networks.

Your plan to wait until you have a LISP pilot running doesn't seem
as good as doing proper research now.  Waiting until a pilot
deployment delays the gathering of the important information, and
would be for only a small number of participating networks.

Some other things I am keen to know are:

1 - How far do "jumboframe" links go into these ISP and AS
    end-user networks.  (Therefore how deep in these networks -
    meaning close to and ISPs' customers' networks - can you place
    ITRs and ETRs.  This strongly affects how you spread the load,
    the capacity these devices must have etc.)

     (Later in your response, you surprised me by stating
      you recommend ITR and ETR functions be in CE routers.
      I had assumed you meant ISP border routers.)


2 - If hosts are expecting some longer MTU, 4470 or whatever, and
    the LISP ITR adds a header which makes the packet exceed a
    4470 limit en-route to the ETR, without any additional
    mechanisms (your preferred solution), then how would the sending
    host know the packet had been dropped?  The PTB will go to the
    ITR (in Ivip, it goes to the sending host, but contains
    information different to the packet sent by the host) - and it
    is not practical to have the ITR figuring out which sending host
    to generate a corresponding PTB for.  See:

      http://www.ietf.org/mail-archive/web/ram/current/msg01766.html

3 - How stable would the PMTU be between any two routers?  Just
    because right now the packets go through gigabit Ethernet links
    which support some longish MTU, how do we know that when there
    is some outage en-route, that BGP won't cause the packets to go
    via some other links, one or more of which has a 1500 MTU?

      (I recall discussion of how 100Mbps Ethernet is still widely
       used at peering exchanges, and that you can't mix gigabit and
       100Mbps Ethernet without having them all run with the 1500
       byte MTU which is typical of 100Mbps Ethernet.)

4 - How could anyone adopt LISP with the "do nothing" approach
    to PMTU and fragmentation, when even a handful of links in the
    DFZ are not capable of handling jumboframes?  They couldn't
    ensure their packets are not sent on those links.

5 - How reluctant will networks be to adopt LISP if they can only
    put their ITRs and ETRs in locations which will always have
    jumboframe-compatible links to their border routers?

In version 06 of your ID, you wrote:

   Based on informal surveys of large ISP traffic patterns, it
   appears that most transit paths can accommodate a path MTU of
   at least 4470 bytes.  The exceptions, in terms of data rate,
   number of hosts affected, or any other metric are expected to be
   vanishingly small.

By "transit link" I assume you mean from any one border router of
any one ISP (or AS end-user network) to border routers of all other
such networks.

Therefore, I assumed you were intending to have the ITR and ETR
functions at these border routers.  But in your response, you wrote
something very different: CE routers.


>>             If the problems of PMTU are deemed not worth
>>             solving within LISP, then LISP would be deployed
>>             on the assumption that all transit links would
>>             be capable of some much higher than 1500 byte PMTU.
> 
> That is correct.

OK.

>>             This would tend to constrain the locations of ITR
>>             and ETR functions to be at or near border routers,
>>             in order that they have unfettered access to jumboframe
>>             capable links to the core of the Internet.
> 
> Right, or you do fragmentation. We did have an out.

This isn't clear, nor do I recall anything like this in the ID.

What does the fragmentation?

The ITR could fragment 4.5k packets into four or so ~1.4k packets,
but this involves four packets, which is really pushing your luck in
terms of efficiency and robustness.  I would contemplate doing that
for a few initial packets while the ITR was probing PMTU to the ETR,
which should take a few seconds - but LISP has no such robust PMTUD
mechanism, so you would have to keep fragmenting all such traffic
packets into four.  I don't think anyone would regard that as
sustainable.

Should ITRs which are configured in some way that they "know" their
one or more upstream links is not jumboframe compatible drop the
>~1500 byte packets and send a PTB?   That's fine, but then you have
to have a configuration item in the ITR, and it could get complex if
the ITR has one jumboframe compatible upstream link you want to use
fully, and one which is not.


There is no PMTU Discovery process in your proposal, as far as I recall.

So in the "do nothing" option, you have most ITRs knowing they can
send jumboframes, and a few as noted above knowing they can't.

Now what does a network do regarding placement of ETRs?  You can't
place an ETR on a non-jumboframe link, because the majority of LISP
ITRs don't have a way of knowing that ETR can't receive jumboframes.

What would happen is that the ITR would send a jumboframe and get
back a PTB.  Then it would have to either:

1 - Forget about it.  Unacceptable packet loss.

2 - Keep a copy of the packet and fragment it into some number of
    pieces, sending them to the ETR, and hopefully having them all
    arrive, since you don't have a way of checking for the
    successful arrival of all packets.   Inefficient with
    significantly higher packet loss rates.  Also, the ITR has
    to keep a lot of state, including whole traffic packets, and
    hope that it does get a PTB message, meaning the PTB packet
    is firstly generated, secondly not filtered out and thirdly not
    dropped.

3 - Somehow recognise which of the recently tunnelled packets the
    PTB is for, look up the sending host and, with a copy of
    the first part of the original packet, send a PTB to the
    sending host so it tries again with a smaller packet.  But
    see my above-linked RAM list message from July last year
    about why this is extremely onerous to do in a secure way.



>>             This would seem to be a major restriction on the
>>             ability of operators to place ITRs and ETRs wherever
>>             they like.
> 
> For Loc/ID purposes (there are several other reasons to use LISP than
> what is intended by this venue), we want to strongly suggest that xTRs
> be placed on CE (CPE) routers. We think that is the best balance of
> tradeoffs.

What other reasons would LISP be implemented for other than solving
the routing scaling problem?

I don't recall this suggestion for ITRs and ETRs being at Customer
Edge routers being mentioned in a LISP draft, but I haven't
necessarily kept up with every change.

My understanding of CE routers is shown by CEx in the figure
below, connecting to Provider Edge routers.

   ----------------        -----------------------    -->  Other
  |  End-user net- |      |       ISP-A           |  /     border
  |  work 1        |      |                       | /      and
  |                |     PE1                     BR1---->  transit
  |                |      |                       | \      routers
  |               CE1--- PE2                      |  \     in the
  |                |      |                       |  |     DFZ.
  |                |       -----------------------   |
  |                |                                 |
  |                |                                 |
  |                |       -----------------------   |
  |                |      |       ISP-B           |  /
  |               CE2----PE3                      | /
  |                |      |                      BR2----> As above.
  |                |      |                       |
  |                |     PE4                      |
   ----------------       |                      BR3---->
                          |                       | \
                           -----------------------   \
                                                      -->

Alternatively, there could be just a single Customer Edge router CE1
in this example, with its second link going to PE3.

I think this means that all ISPs which adopt LISP would need to be
entirely jumboframe compatible, assuming your preferred "do nothing"
scenario is the case.

If my understanding is correct, locating the ITR and ETR functions
in the CE routers also requires all links to the PE routers be
jumboframe compatible.

This may make sense at the "top-end of town", but I think the future
map-encap system needs to be extremely broadly applicable, to
end-users of all types and sizes, including especially businesses
with a DSL link and a cable modem link, for instance.  These common
types of data link will not be jumboframe compatible in the
foreseeable future.


>>             Likewise, it would seem to reduce the number of
>>             devices which could do ITR or ETR functions and
>>             thereby lead to bottlenecks and to these devices
>>             needing to be large and expensive.
> 
> No, not bottlenecks, easier deployability.

My comment was based on the notion that the ITR and ETR functions
would be located a the Border Routers of ISPs, because I figured
that was the most jumboframe compatible location in many ISPs.

If you can place the ITRs and ETRs within or near the CE or PE
routers, then this greatly reduces bottlenecks and makes the system
more deployable.  But then you need to extend gigabit Ethernet to
all those places.


> You make it sound that by adding LISP to a router, it will get
> overloaded. You capacity design a router to deal with the input rate and
> density of the box and the amount of work you have to do. ITRs won't
> attract more traffic when they are at the CE. They get traffic based on
> who wants to send data to external destinations.

Yes, there is no more traffic.  However, the ITR functionality in
LISP is complex and so the router has more work to do than before.
Likewise ETR functionality is an additional burden.

I am planning Ivip so that the whole thing could be deployed without
changing a single router.  All ITR and ETR functions could be done
in ordinary servers - and likewise the fast-push system.  In many
cases it would be better if the ITR and ETR functions were
integrated into routers, but it would not be necessary to deploy the
system successfully.

LISP ITR and ETR functionality is a lot more complex than with Ivip,
and if you are assuming that the LISP ITR and ETR work will always
be done by upgraded routers, I think this would be a major barrier
to many networks adopting your system.



>> L = 1500
>> H =   36
>> S = 1464
>>
>>> 1.  Define an architectural constant S for the maximum size of a
>>>    packet, in bytes, an ITR would receive from a source inside of
>>>    its site.
>>
>> S is 1464 bytes.  But an ITR could receive a packet of any length from a
>> source inside its site.  So this sentence makes no proper sense to me.
> 
> It means that if the packet from the source is >= 1464, the packet will
> be fragmented.

This is what I understood from later in your text, but this does not
match the sentence:

  1.  Define an architectural constant S for the maximum size of a
      packet, in bytes, an ITR would receive from a source inside of
      its site.

since an ITR could receive packets from inside the site of any
length which fits within the MTU of the link it arrives on.  If you
added something like "and which would be sent without
fragmentation." Then this would make more sense to me.


>>> When an ITR receives a packet of size greater than L on a site-facing
>>> interface and that packet needs to be encapsulated, it resolves the
>>> MTU issue by first splitting the original packet into 2 equal-sized
>>> fragments.  A LISP header is then pre-pended to each fragment.
>>
>> I guess you meant 'S' (1464) rather than 'L' (1500).
> 
> The total packet size after the ITR is finished with it is L.

Yes, but your text refers to "S" as the length of the packet as it
is received, before it is encapsulated.  I think your text could be
modified to "greater than S" to make it more in accordance with your
intended meaning.


>> However, all this assumes that the ITR has a 1500 byte PMTU to the ETR.
>> In many cases, the PMTU will be a lot higher.  So the above algorithm
>> does
>> not allow the ITR to send longer packets without fragmentation.
> 
> That is correct. But maybe a source site that talks to a destination
> site of the same ISP that advertises support for larger MTUs, then the
> ITR be configured with an L value of 4470 or 9182 perhaps.

Working backwards:

> the ITR be configured with an L value of 4470 or 9182 perhaps.

This wasn't mentioned in the ID - only "1500 bytes".

How does anything advertise MTUs, or larger MTUs?  There is nothing
in LISP headers or map reply messages concerning MTUs.  BGP doesn't
carry MTU information.  LISP has no reliable Path MTU Discovery
mechanism.  The closest you get is the ITR receiving a PTB message
if the encapsulated packet is too big for some link en-route to the
ETR.  But that is not robust.  Fred Templin and I believe that the
only way to do robust PMTUD is with explicit probe packets from the
ITR to the ETR, with nonce-protected acknowledgements of proper
reception.

I am confused about the "same ISP" above.  Perhaps you mean the
destination end-user network has its ETR connected to an ISP which
provides it with 4470 or 9182 byte MTUs to the DFZ.

How could the ITR in the first site know this?  How could it build
up a set of information about PMTUs to various ETRs, so it could
reject packets which were too long (after encapsulation) for each
PMTU, when the sending host sends that packet, depending on which
link the encapsulated packet was to be sent out on, without any
robust mechanism for PMTU discovery?


>> IPv6 or IPv4 with DF = 1
>> ------------------------
>>
>>> ...  the ITR will drop the packet when the size is greater than L, and
>>> sends an ICMP Too Big message to the source with a value of S, where S
>>> is (L - H).
>>
>> This makes no sense to me.  It would make more sense if this occurred
>> when
>> the packet length was greater than S (1464 bytes for IPv4 or 1444 for
>> IPv6).
> 
> You want the ITR to tell the host to send a size so when the outer IP
> header, UDP header, and  LISP header are prepended, the size of the
> packet is L.

Yes, that is what I meant.  Your text would make more sense to me if
it was something like:

  the ITR will drop the packet when the size is greater than S ...


 - Robin              http://www.firstpr.com.au/ip/ivip/




--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg
Follow-Ups:
- [RRG] RE: LISP PMTU & fragmentation problems
  - From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
References:
- [RRG] LISP PMTU & fragmentation problems
  - From: Robin Whittle <rw@firstpr.com.au>
- [RRG] Re: LISP PMTU & fragmentation problems
  - From: Dino Farinacci <dino@cisco.com>
Prev by Date: Re: [RRG] Re: LISP PMTU & fragmentation problems
Next by Date: [RRG] Re: LISP gleaning looks insecure and therefore unusable
Previous by thread: RE: [RRG] Re: LISP PMTU & fragmentation problems
Next by thread: [RRG] RE: LISP PMTU & fragmentation problems
Index(es):
- Date
- Thread