[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [RRG] Path MTU Discovery: a new approach



Robin,

Your proposal needs to talk about the setting of DF in
the outer IPv4 header after encapsulation. Based on my
5+ years of studying this, if it sets DF=1, its busted.

IMHO, SEAL is well beyond the research phase now and
pretty deep into engineering solution space. It is
written in the form of a functional specification from
which a programmer can actually produce running code.
Therefore, I think it is ready for experimentation on
a wider scale.

Thanks - Fred
fred.l.templin@boeing.com   

>-----Original Message-----
>From: Robin Whittle [mailto:rw@firstpr.com.au] 
>Sent: Monday, April 21, 2008 1:14 AM
>To: Routing Research Group
>Cc: William Herrin
>Subject: Re: [RRG] Path MTU Discovery: a new approach
>
>Hi Bill,
>
>Thanks for your summary, which is correct in many respects - pretty
>good going if you only read the list message, rather than the page
>itself:
>
>  http://www.firstpr.com.au/ip/ivip/pmtud-frag/
>
>> 1. Have the ITR maintain an "uncertainty zone" for sizes of packets
>> that can be sent to a given ETR. The uncertainty zone is bounded by a
>> size previously determined to be smaller than or equal to the actual
>> PMTU (LPME) and a size previously determined to be larger than the
>> actual PMTU (UPME).
>
>Yes.
>
>
>> 2. The ITR encapsulates and transmits packets smaller than LPME
>> normally. 
>
>Yes, except the ITR should probably send a few such packets with
>RPD2 (BTW, if anyone can think of a better acronym  ...) to explore
>the possibility that the Real PMTU is now lower than LPME or higher
>than UPME.  These need to be rate limited.  Most of the time, there
>will be no such change from month-to-month, but sometimes 
>there will be.
>
>
>> It rejects packets larger than UPME immediately with a too-big
>> message.
>
>Yes, except for occasionally where it uses one as an explorative
>probe to detect if the Real PMTU has risen above UPME.  If the
>packet is not delivered, then it sends a PTB to the SH as you
>describe, with MTU value equal to UPME.  (If it gets a PTB from the
>tunnel, the MTU in that PTB is used to set an upper limit on UPME.)
>
>The only packets which are always rejected with a PTB are those
>which, once encapsulated, would exceed the MTU of the interface the
>ITR uses to send packets to this ETR.
>
>
>> 3. If the packet size is in the uncertainty zone, encapsulate it with
>> RPD2 instead of the normal encapsulation and hold the original packet
>> until the ETR responds. This encapsulation consists of two packets:
>> one in the uncertainty zone and one smaller than LPME. 
>
>Actually, the small one will not only be smaller than LPME, it will
>be way smaller than some figure like 1200 bytes, which we assume can
>be sent from any ITR to any ETR without PMTU problems.
>
>> If successfully transmitted, the ETR will reassemble the two 
>packets into 
>> one before passing them on.
>
>Yes - if the ETR receives the big Packet B and at least one small
>Packet A.
>
>This is true except for the just mentioned occasional exploratory
>probe packets of length longer then UPME or shorter than LPME.
>
>
>> 4. The ETR is required to respond to the ITR with information about
>> all communications associated with RPD2, in addition to 
>delivering the
>> packets. By comparing the ETR's response to the RPD2 
>messages with the
>> RPD2 messages it sent, the ITR can narrow the uncertainty zone until
>> LPME and UPME meet.
>> 
>> Please correct any part of that I misunderstood.
>
>There a few other points.
>
>1 - Packet B, the large one, is sent with its outer header's source
>    address set to the ITR's address.  This is true in all instances
>    or RPD2, including Ivip.  In Ivip, the Packet As are sent with
>    their outer source address being that of the SH.
>
>2 - Therefore if Packet B gets to a router in the ITR --> ETR tunnel
>    with an outgoing MTU which is too small for it, the ITR will
>    receive a Packet Too Big message.  (Except if the Packet B or
>    the PTB packet are dropped for some random reason, or if the PTB
>    is blocked by a filter.  A BCP will say: Don't put your ITRs and
>    ETRs behind such filters.)
>
>3 - When the ITR gets a PTB from the tunnel, is told by the ETR that
>    the Packet B didn't arrive in a reasonable, but short,
>    time-frame (maybe try twice) it sends a PTB back to the
>    Sending Host (SH) - so the SH will try again, with a smaller
>    packet, and no data should be lost to the application.
>
>4 - If the ITR simply gets back from the ETR, it might try again.
>    I am not sure what the ITR would do then, but I don't think it
>    should be adjusting down its UPME variable, or sending PTBs to
>    the SH, just because it can't get a report of any kind from the
>    ETR.  This is probably a temporary glitch.  If it is permanent,
>    then there's no point in sending a PTB anyway, since the data
>    will never get to this ETR, at least via this ITR.
>
>Also, the ITR always* learns something truthful when it uses RPD2 to
>send a packet with a length within the Zone of Uncertainty.
>
>*  This is not counting extreme cases where two attempts at sending
>   the sets of packets do not result in the ITR receiving a report
>   from the ETR - but that would be a case of at least temporarily
>   very poor reachability between the two, so we can't expect
>   anything better.
>
>
>> Two questions, one note:
>> 
>> Question #1: How does the ITR determine that its old PMTU 
>estimate has
>> been invalidated, either because of a route change or because
>> individual packets are being transmitted along multiple channels each
>> with a different PMTU?
>
>There needs to be some low rate of exploratory probing using RPD2
>sending of some packets shorter than LPME and longer than UPME.
>
>
>> If I understand you, packets are not transmitted with RPD2 unless the
>> ITR believes the size falls in the uncertainty zone, 
>
>Yes, except for the occasional exploratory shorter and longer packets.
>
>> and not transmitted with the ITR's source IP address regardless,
>
>The long Packet B of RPD2 is always sent with the outer header's
>source address being that of the ITR.
>
>> so the ITR has no real hope of seeing normal too-big complaints.
>> So how does it ever decide that its estimated PMTU is no longer
>> valid?
>
>Ivip's ordinary encapsulation of traffic packets (IP-in-IP) has the
>outer header set to the SH's address.  So the ITR gets no PTB from
>them, and a properly implemented RFC 1191 SH would not recognise the
>PTB either.
>
>A SH which was looking out for this kind of PTB could detect it, but
>I haven't explored this and am determined not to make any part of
>Ivip dependent on host changes - other perhaps than a souped up
>traceroute program.
>
>Occasional shorter and longer exploratory probe packets, with direct
>reports from the ETR will detect changes in the Real PMTU outside
>LPME to UPME - but not as fast as if the normally encapsulated
>traffic packets had the ITR's address as their source *and* the ITR
>could store enough state to securely validate PTB messages they cause.
>
>A non-Ivip ITR, or some other device using this IPTM - RPD2
>procedure probably could use the ordinary encapsulation to detect
>the Real PMTU getting shorter than it currently assumes.  The trick
>would be to only cache the information for a handful of the longest
>packets.  There's no point in caching stuff for the shorter ones
>while longer ones are being sent, close to or at the limit set by
>LPME.
>
>Relying on securely checked PTBs is a pretty good way of finding out
>that the Real PMTU has got shorter than LPME.  Using one or more
>non-arrivals of the long probe packet at the ETR is not quite as
>reliable, since this could occasionally occur due to bad luck with
>packet loss.  It would be bad to lower LPME in a spurious way, due
>just to non-arrival of the probe packet (rather than the gutsier way
>of getting a real PTB).  This would result in the ITR sending a PTB
>to the SH with a lower than needed MTU value.  The SH would then be
>bound to use that value to limit its packet size for the next ten
>minutes.  This is bad, but not disastrous - it is just a loss of
>efficiency, rather than a loss of data or of connectivity.
>
>Relying on a report from the ETR that a long packet did arrive OK is
>the best way of detecting that the Real PMTU is higher than UPME.
>The mere absence of PTBs is not as reliable, since they could be
>dropped randomly (or the probe packet dropped randomly before it hit
>the PMTU limiting router) - or perhaps the PTBs could be blocked by
> ICMP filters which violate the BCP recommendation.
>
>IPTM - RPD2 can do its job reliably without PTBs from the tunnel,
>but if they are there, that is better.  The ITR has to be able to
>get the PTBs it generates to SH, but if it can't do that, then we
>are sunk anyway.
>
>The sections:
>
>  Discovering changes in Real PMTU
>
>  An alternative to the RPD2 approach of splitting the traffic
>  packet
>
>discuss the various approaches, with and without Ivip's "outer
>source = SH" approach, including some promising possibilities of
>ITRs only caching some packets, and alternatives to RPD2's approach
>of splitting the traffic packet.
>
>
>> Question #2: nearly every ITR->ETR map will trigger the use 
>of RPD2 as
>> two associated end sites begin transmitting data. 
>
>This is quite different from the debate about "pure pull" (LISP-ALT
>and TRRP, though I now think neither is quite so pure) ITRs
>frequently delaying initial packets.
>
>Firstly, RPD2 is only used for packets longer than 1200 bytes.  This
>means that almost all session establishments will not be encumbered
>by RPD2, since I figure very few protocols start up with such long
>initial packets.  Many kinds of traffic will never require packets
>longer than 1200 or whatever bytes, including DNS and almost all
>HTTP traffic in the client -> server direction.  I figure SMTP and
>many other protocols only have big packets going in one direction
>for each session.
>
>Secondly, the burden of RPD2 is primarily due to involving the ITR's
>and the ETR's central CPU.  There is also the burden of sending
>extra packets, but the probe Packet B is the same length as an
>ordinarily encapsulated packet, and the 2 or maybe 3 short Packet
>A's are likely to be 100 bytes or less each.
>
>There no significant extra delay.  Assuming the Packet B and at
>least one of the first two Packet A's get to the ETR, the traffic
>packet is delivered.  This need not take more than a fraction of a
>millisecond longer on high-speed links, unless the central CPU does
>not have the capacity to attend to this promptly.  These delays
>would be far shorter than the delay of looking up mapping in the ALT
>or TRRP global query server system, or using their initial packet
>delivery systems to get the packet to the ETR before the ITR has the
>mapping.
>
>Also, these RPD2 packets do not involve data loss to the
>application.  Sometimes, they require a resend with a smaller packet
>- but that is when the only way of delivering the original packet
>would be via some fragmentation or other splitting mechanism, since
>the packet, once encapsulated, was in fact too big for the tunnel PMTU.
>
>
>> Given the complexity, you're looking at a general-purpose CPU on 
>> both ends to handle this. What sort of impact does that have
>> on the system capacity?
>
>I can't say for sure.  I can't think of a simpler approach, and this
>PMTUD stuff really does need to be solved.  There may well be some
>gotchas, but the way it looks now is far better and cleaner than I
>thought would be possible a few days ago.  Since October I have
>assumed we would need synthetic probe packets and that it would be
>necessary to break up some packets into smaller chunks to deliver
>them in spite of PMTU limitations.
>
>In this scheme, no traffic carrying probe packet goes to waste.   It
>is either delivered and the ITR learns about the Real PMTU, or it is
>not delivered, and the ITR also learns - with no application data
>loss.  Then the RFC 1191 SH automatically cooks up a shorter packet,
>which is just what is needed for the ITR to find out more about the
>Real PMTU.
>
>
>> Note #1: in your document, you describe the ETR returning multiple
>> packets to the ITR for each received RPD2 packet, until the ITR
>> acknowledges receipt. This potentially resurrects our old friend, the
>> smurf amplifier.
>
>This is definitely a gotcha.  This IPTM - RRG stuff didn't exist two
>days ago, so it amenable to change.  Maybe limit the retries to a
>single retry, or at most to two.  That only gives an amplification
>factor of two or three.
>
>The report packets would be pretty short, and if generated by an ETR
>in response to bogus Packet As' would be ignored by most devices,
>including any ITR.
>
>Perhaps a way to discourage attackers using of this aspect of the
>ETR's functionality would be to ensure that the Packet As needed to
>be as long as the total length of the two or three ETR -> ITR report
>packets.  But that just adds overhead to the entire protocol.
>
>  Cheers
>
>    - Robin
>
>
>--
>to unsubscribe send a message to rrg-request@psg.com with the
>word 'unsubscribe' in a single line as the message text body.
>archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg
>

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg