[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] Tunnel fragmentation/reassembly for RRG map-and-encaps architectures

To: "Templin, Fred L" <Fred.L.Templin@boeing.com>
Subject: Re: [RRG] Tunnel fragmentation/reassembly for RRG map-and-encaps architectures
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Date: Sat, 22 Dec 2007 13:47:31 +1300
Cc: Routing Research Group list <rrg@psg.com>
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:organization:user-agent:mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; b=CxZ10pq+hM12OxR1K9ab5agwLMFKJoJmJApnDFRubFIlUk5QUik/sPSz2L1KrjjwIwWuPY7crMrjptzXLwAGYs9CBNOb/0SAuyzRtjFnn+GdK7vnmOQRHa0VMEUxhvn5snr9EEhc/AsuYiwcc8ZpvdYrNPreqWLJSIG307nCh8A=
In-reply-to: <39C363776A4E8C4A94691D2BD9D1C9A1029EDD30@XCH-NW-7V2.nw.nos.boeing.com>
Organization: University of Auckland
References: <EAB3BF96-D438-459E-A753-F9D72B1FE5B6@muada.com> <EC1BB972-6F21-4CBC-B827-BB1840C25AE8@cisco.com> <99E69FED-D637-4F47-985E-6AB0DCB0B8E0@cisco.com> <54ADF2DD-8FFC-45BD-9C56-BD548A91528B@cisco.com> <39C363776A4E8C4A94691D2BD9D1C9A1029EDD10@XCH-NW-7V2.nw.nos.boeing.com> <F004A384-2347-4CCF-8D5E-481EDD0C62D4@muada.com> <39C363776A4E8C4A94691D2BD9D1C9A1029EDD15@XCH-NW-7V2.nw.nos.boeing.com> <39C363776A4E8C4A94691D2BD9D1C9A1029EDD1C@XCH-NW-7V2.nw.nos.boeing.com> <39C363776A4E8C4A94691D2BD9D1C9A1029EDD30@XCH-NW-7V2.nw.nos.boeing.com>
User-agent: Thunderbird 2.0.0.6 (Windows/20070728)

I'm still not sure that I get the point of any of this.

The Internet architecture says that end-systems are
responsible for e2e issues, and PMTU sure seems like
an e2e issue. It's a blot on the architecture that
IPv4 permits fragmentation en route; imagine a world
in which DF is set in every IPv4 packet - hosts would
end up discovering (or configuring) a suitable MTU.
That's also how IPv6 works.

I think it's another blot on the architecture for
a map-and-encap solution to even touch this problem -
except for a SHOULD statement about the MTU to be provided
by every tunnel, which should obviously be a bit bigger
than the required IPv6 Internet MTU.

   Brian

On 2007-12-21 07:52, Templin, Fred L wrote:

Just to have disclosure for the current understanding
of this out on the list, the full proposal is given
below. LISP is used as the example RRG map-and-encaps
architecture, but the same approach can be applied to

any such scheme:

1) Add two flag bits plus an ID field to the LISP
header. The "Frag" flag is 0/1 if the packet is
unfragmented/fragmented, and the "Seg" flag is 0/1
if this packet is the A or B Segment of the original
packet. The ID field can be a 16-bit extension to
the IPv4 id, or a 32-bit ID that is independent of
the IPv4 id (FFS). To save encapsulation overhead,
the two flag bits could share the same word/longword
as the LISP ID field to give a 14+2 [ID/flags] field
or a 30+2 [ID/flags] field and still leave enough ID
bits to avoid reassembly misassociations.

2) Before the ITR knows whether the ITR->ETR tunnel
pathMTU can handle original packets up to 1500 bytes,
it sends 1500- original packets into the tunnel as
either 1-fragment or 2-fragment packets of no more
than (750+ENCAPS) bytes each. The assumption is that
packets of this size will NEVER be dropped due to an
MTU restriction in the ITR->ETR tunnel, which may
have implications for the placement of the ITR and
ETR in the network. (To the best of my knowledge,
the 750 byte fragment suggestion is attributed to
Iljitsch van Beijnum, however RFC1812 does suggest
that sending equal-sized fragments is a legitimate
fragmentation strategy.)

3) The ITR creates 2-fragment packets by splitting
the original packet into A and B segments. Both
segments set the "Frag" flag. The A segment sets
"Seg=0" and the B segment sets "Seg=1". Both segments
set corresponding values in the LISP ID field. (The
A and B segments should set different values in the
IPv4 ID, since the two packets might otherwise appear
as duplicates.)

4) While sending 2-fragment packets, the ITR sends
1500 byte sprite-mtu probes to the ETR. If it gets
probe replies back, it can stop sending 2-fragment
packets and begin sending 1-fragment packets. If the
ITR subsequently gets a packet-too-big, it can resume
sending 2-fragment packets and try probing again later.

5) The ITR admits 1501+ packets into the tunnel without
fragmenting them. FFS, the ITR either admits the packets
(if they are no larger than the outgoing IPv4 interface)
and sends NO PTB feedback back to the original source
(stateless), or admits the packet and also sends back a
PTB. In the latter case, the ITR also sends sprite-mtu
probes to determine the maximum packet size that the
ITR->ETR path can accommodate. When a maximum size is
determined, the ITR then has an accurate path MTU value
that it can use to determine when to send a PTB back to
the original source. But, this requires extra state in
the ITR. In both cases, the assumption is that original
sources that send 1501+ packets are also doing something
like RFC4821. This should appear in a BCP document.

6) The ETR must reassemble any 2-fragment packets. It
does so through simple concatenation of the A and B
parts of the 2-fragment packet. The ETR can greatly
reduce the memory required for reassembly buffers by
actively discarding any reassemblies that appear to
have no chance of completion. This assumes that any
packet reordering on the ITR->ETR path will be on the
order of a small number of positions (~100 or less),
and that any gross reordering will be short-lived
in nature. (In fact, if the ETR does this it might
be possible to drop back to using plain-old IPv4
fragmentation and reassembly instead of adding new
encapsulation overhead to LISP. But then, there would
still only be a 16-bit IPv4 ip_id which some have
argued is a problem. This is FFS.) In both cases,
the ETR must configure a reassembly buffer size of
at least 2KB to accommodate reassembly for 1500-
original packets plus any outer layers of encapsulation
including the LISP/UDP/IPv4 encapsulation and any
additional encapsulations such as IPsec/ESP, L2TP,
etc. To the best of my knowledge, the 2KB reassembly
requirement idea is attributed to a list posting from
Dan Romascanu (however I had heard it mentioned at
least once previously in private conversation) and

should also appear in a BCP document.

7) The idea of a 2-fragment maximum for fragmentation
is something that many of us have talked about over
the years in on- and off-list discussions and has also
appeared in some drafts. I believe Robin Whittle has
more recently been talking about this idea with specific
reference to the RRG problem space, but the idea itself
is not new.

8) Again, the above can apply equally to *any*
map-and-encaps tunnel-oriented proposal for RRG,

and not just LISP.

Fred
fred.l.templin@boeing.com

PS: If I have mis-attributed any of the ideas, or failed
    to attribute others, please let me know. I recognize
    that it can be very difficult to keep track of who
    thought of what first...

-----Original Message-----

From: Templin, Fred LSent: Tuesday, December 18, 2007 2:25 PM

To: Routing Research Group list
Subject: [RRG] LISP Fragmentation and Reassembly

"sprite-mtu" identifies conditions under which the ITR
must fragment outer packets which the ETR must reassemble.
(And, from the list discussions, we seem to be reaching
consensus that any reassembly at the ETR must be limited
to 1500 byte or smaller original packets.) But, sprite-mtu
further defines a "dance" that needs to be orchestrated
between the ITR and ETR when IPv4 fragmentation and
reassembly are occurring, which may be too onerous for
some deployments.

The reason for the dance is that the 16-bit ip_id needs
to be carefully monitored when IPv4 fragmentation is
occurring such that any reassembly misassociations are
detected. But, if we disable IPv4 fragmentation and instead
define a LISP-specific fragmentation and reassembly at
the LISP shim layer immediately above UDP, then we can
have the shim layer insert a 32-bit id which would avoid
the RFC4963 issues and eliminate the need for close
coordination with the ETR.

I have already specified such a mechanism for DHCP:

http://www.ietf.org/internet-drafts/draft-templin-dhcpmtu-00.txt

and a similar specification for LISP would be nearly identical.
The penalty is extra LISP encapsulation overhead, but the
benefit of avoiding the need for synchronization between the
ITR and ETR is substantial. (In fact, it may be essential to
the successful deployment of LISP.) Also, when the path MTU is
large enough to accommodate all 1500 byte and smaller packets
without fragmentation, the extra encapsulation overhead can
be eliminated.

So, I am inclined to write this up as a draft but probably
won't be able to get to it until after the 1st of the year.
It could either go as part of the sprite-mtu draft, or as
an independent draft like the DHCP one. But better yet
would probably be to just put it directly into the LISP
specification itself. What does anyone think?

Thanks - Fred

fred.l.templin@boeing.com

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg

Follow-Ups:
- Re: [RRG] Tunnel fragmentation/reassembly for RRG map-and-encaps architectures
  - From: Tony Li <tli@cisco.com>

References:
- [RRG] LISP-NERD reachability and MTU detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- Re: [RRG] LISP-NERD reachability and MTU detection
  - From: Dino Farinacci <dino@cisco.com>
- Re: [RRG] LISP-NERD reachability and MTU detection
  - From: Tony Li <tli@cisco.com>
- Re: [RRG] LISP-NERD reachability and MTU detection
  - From: Dino Farinacci <dino@cisco.com>
- RE: [RRG] LISP-NERD reachability and MTU detection
  - From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
- MTU stuff, was Re: [RRG] LISP-NERD reachability and MTU detection
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
- RE: MTU stuff, was Re: [RRG] LISP-NERD reachability and MTU detection
  - From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
- [RRG] LISP Fragmentation and Reassembly
  - From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
- [RRG] Tunnel fragmentation/reassembly for RRG map-and-encaps architectures
  - From: "Templin, Fred L" <Fred.L.Templin@boeing.com>

Prev by Date: Re: [RRG] Re: [RAM] Different approaches for different protocols
Next by Date: Re: [RRG] Tunnel fragmentation/reassembly for RRG map-and-encaps architectures
Previous by thread: Re: [RRG] Tunnel fragmentation/reassembly for RRG map-and-encaps architectures
Next by thread: Re: [RRG] Tunnel fragmentation/reassembly for RRG map-and-encaps architectures
Index(es):
- Date
- Thread