[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] PMTUD, Sprite & IPTM; Outer src-addr = sending host's addr



On 24 nov 2007, at 3:56, Robin Whittle wrote:

An ITR can't set its inbound MTU to the lowest value of any ETR it
might need to send packets to.

Hm, you're right: in a point-to-point tunnel this is trivial, but in a point-to-multipoint tunnel the MTU will be different for different destinations. This complicates matters, but I would still like to stick to the basic principle that the MTU the tunnel presents to encapsulated packets is a simple reflection of the path MTU discovered through the sending of tunneled packets.

is different from what you wrote.  So is Fred's, I think.  My page
above contains a complete description of what I think is the
problem, and the best solution.  I would really appreciate you or
anyone else writing a detailed critique of this.

I'm sorry, but I'm not prepared to do that at this time. I've been discussing tunnel MTU issues for some time recently, and my conclusion is that you either need to have a rather involved scheme that is supported on both ends (hard for existing tunneling mechanisms but doable for something new) or it's necessary to reject a good number of seemingly reasonable use cases to keep things workable. There doesn't seem to be much useful middle ground here.

If you like, I can send you copies of the ~ 200 message exchange between a number of people that led up to Fred's sprite MTU proposal.

So I suggest sending it if it is shorter
than some assumed limit (eg. 1280) and fragmenting it if it is
longer - irrespective of whether its do not fragment bit is set.

That is a really bad solution, because this guarantees a good
amount of fragmenting.

As I point out in my proposal, fragmentation is only performed for
those initial long packets in a potential stream to an ETR which the
ITR hasn't sent packets to recently.

Obviously this would be a fraction of all packets under normal circumstances, but it would still mean that routine packets (i.e., 1500-byte ones) would be fragmented, which I don't like.

With IPv4, this is rather problematic because of the small
ID space.

This is only likely to be for a handful of packets, though I suppose
one could construct a worst-case scenario of a sudden burst of long
packets which would need to be fragmented until the ITR could figure
out the true MTU.

Right. So you'd probably have to be prepared to deal with this even though it wouldn't be an issue in practice most of the time.

I haven't looked closely at the fragmentation reassembly problems of
IPv4.  Can you point to some references?

I don't have any references, but in short, the issue is that you have a 16 bit ID space with a reassembly timeout of something like a few minutes. This means you can only send 65536 packets during that "few minute" window or you'll incorrectly reassemble fragments from different packets if you lose a fragment. This is especially problematic if the fragmented packets belong to a tunnel because in that case the IP source/dest addresses are always the same.

It also costs you lots of CPU and could even allow for CPU
exhaustion attacks.

Yes, but I think it is better than dropping longish packets just
because we assume some too low PMTU of 1280 or whatever, when in
fact, within a second or two, the ITR will probably be able to
establish that the real PMTU is 1500 or somewhat less.

Who said anything about preemptively dropping packets? Just send 1500- byte packets + an outer header with DF set and you'll get a "too big". After that, you know the path MTU and you can in turn send too bigs to the source of the original packets.

Yes, this allows for PMTUD black holes, but those are subject to the "so don't do that and the problem goes away" doctrine. ISPs generally get this, unlike enterprise people and ignorant consumers who can't live without their firewalls.

Still, we can predict that there will be such large packets early on
in many communications.  Simply dropping them doesn't seem right to
me.  Dropping them with a too-low PMTU value being sent to the
sending host would screw up that host's later packets, making them
shorter than they really need to be.  I think fragmenting them at
first is the best approach.

If we mandate that *TRs support 1500-byte user traffic without fragmentation this wouldn't be any issue in practice.

Later, if more such packets need to be sent, the ITR and ETR can
work on determining the real PMTU.  I do this with probe packets,
rather than traffic packets.

Even more overhead...

Yes.  However I can't see a way of probing the PMTU in any other
way.  ICMP can't be relied upon, and if I tried to use only traffic
packets, I would have to risk those packets not arriving.  Instead,
IPTM fragments the traffic packets and sends its own probe packets.
This means there is no fancy overhead in traffic packets - they are
not intended to be used for PMTUD at all.

I REALLY don't like this: generating singalling traffic when there is no data traffic is a very bad precedent. However, we probably need to probe for reachability in some way or another, if we can do the MTU stuff along with that i may be tolerable.

Iljitsch

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg