[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RRG] FLOWv6: IPv6 Flow Label to control DFZ forwarding
- To: Routing Research Group <rrg@psg.com>
- Subject: [RRG] FLOWv6: IPv6 Flow Label to control DFZ forwarding
- From: Robin Whittle <rw@firstpr.com.au>
- Date: Thu, 31 Jul 2008 22:39:17 +1000
- Organization: First Principles
- User-agent: Thunderbird 2.0.0.16 (Windows/20080708)
I woke up this morning thinking:
The IPv6 Flow Label isn't being used.
How many bits is it??
20 . . . . !!!!
This is an attempt to achieve what we are trying to do with
map-encap schemes and with Six/One Router - but not by using
tunneling or forwarding.
Instead, an IFR (Ingress Flow Router - like an ITR) uses mapping
info to set the packet's Flow Label, which is then used by somewhat
modified BGP routers in the DFZ to forward the packet across the DFZ
to the EFR (Egress Flow Router - like an ETR).
This proposal is is about 14 hours old, and I am keen to work with
other people on it. Please let me know any concerns and suggestions
for improvements. The text below is a rough first draft and is
written as from "we", since I hope to have some collaborators on
this project.
The advantages seem to include:
1 - The packets do not get any longer.
2 - Therefore, no extra length problems adding to PMTUD
(Path MTU Discovery) difficulties.
3 - Therefore, no extra packet-length overheads, which are
particularly pernicious for IPv6 map-encap of VoIP packets.
See examples at the start of:
http://psg.com/lists/rrg/2008/msg02034.html
4 - The packet and its addresses are not changed at all.
5 - This means that Traceroute and ordinary PMTUD should work
without any special measures, all the way through the
"Flow Label forwarded" part of the path.
6 - There is no problem with header checksums, crypto protocols
etc. since AFAIK the Flow Label is is not used in any header
checksums.
7 - The system relies on significant, but probably not overly
complex modifications to the DFZ routers' FIB and RIB.
The process of using the 20 bit Flow Label to directly look
up the FEC (Forwarding Equivalence Class) is much less
expensive than using Tree-Bitmap to chomp through up to
48 bits of address.
8 - The system should support the TTR approach to mobility just
like Ivip, with TTR to Mobile Node tunneling remaining, and
the ITR to TTR tunnels replaced by Flow Label forwarding.
9 - No host changes or changes to the BGP protocol.
10 - IFR and EFR functionality is much simpler than of Ivip ITRs
and ETRs - which is simpler than that of other map-encap
schemes.
The main disadvantages seem to be:
1 - Can't work for IPv4.
2 - Requires significant, but probably not too complex, changes
to the FIB and RIB of most DFZ routers. Hopefully this can
be done with state-of-the-art routers simply with firmware
changes, since the FIBs of recent Cisco and Juniper routers
are based on CPUs.
3 - Requires a change to the formal semantics of what is currently
called the "Flow Label" - which should perhaps be renamed,
since it is being used for purposes other than "Flow".
I suppose this is also a "Locator / Identification Separation"
solution, but I never liked that term - and it most certainly does
not involve any new locator or separator namespace, at least for IP
addresses. The 20 bit Flow Label does involve a new
namespace.
FWIW, I still think IPv6 addresses are about 64 bits too long. This
bloats all the headers, involves even 64 bit CPUs shovelling two
long words around - just for an IP address!
However, if an approach such as FLOWv6 is successfully implemented,
this would overcome my biggest objection to IPv6: that every DFZ
router needs to chew its way laboriously through up to 48 bits of
destination address for every packet it handles.
- Robin
FLOWv6
======
A scalable routing and addressing proposal for IPv6, using
the 20 bit Flow label and modified BGP routers to forward
packets across the DFZ to the EFR (ETR) - rather than
using encapsulation or translation.
Robin Whittle and <Your Name too??>
Introduction
------------
Most potentially practical solutions to the routing scaling problem
involve map-encap (LISP, APT, Ivip and TRRP), which adds to the
length of packets as they are tunnelled from an ITR to an ETR. This
presents efficiency problems and requires some complex handling of
problems arising from disruption of Path MTU Discovery (PMTUD)
mechanisms.
An alternative to encapsulation is translation: rewriting the source
and destination addresses of packets as they enter and leave the
end-user networks which use the new kind of scalable, multihomable,
portable, address space. Six/One Router is the only current
proposal which uses translation. Translation avoids the
inefficiencies of the extra headers used by map-encap. Translation
does not extend the packet lengths - which should result in fewer
PMTUD problems than with map-encap.
However, rewriting the source and and destination addresses can
cause problems for encrypted protocols, and seems likely to disrupt
conventional PMTUD mechanisms. For the latest version of Six/One
Router, and a tentative critical review, see:
http://psg.com/lists/rrg/2008/msg02034.html
FLOWv6 is neither a translation nor a map-encap scheme. It does not
directly resemble MPLS and it does not concerned the conventional
concept of a "Flow".
FLOWv6 aims to achieve the goals of a map-encap scheme such as Ivip,
but without encapsulation, and ideally without disrupting PMTUD. In
the following description, FLOWv6 replicates some characteristics of
Ivip which distinguish it from the other map-encap schemes,
including fast push of mapping information for real-time control by
end-users. This makes the functions of multihoming monitoring and
decision making for service restoration completely separate from
FLOWv6 - as it is from Ivip. These functions are monolithically
integrated into LISP, APT, TRRP and Six/One Router.
If this proposal goes ahead, it might be best to rename the 20 bits
in the IPv6 header to something other than "Flow Label" - so this
proposal and some of its terms would need to be renamed too.
Encapsulation for IPv6 is even more undesirable an operation than
with IPv4, since it adds at least 40 bytes to each packet. This is
for simple IP-in-IP encapsulation, as used by Ivip - while for IPv4
the overhead is 20 bytes. For IPv6 map-encap with LISP: IP, UDP and
LISP headers, the overhead is 56 bits. With traffic volumes
multiplying rapidly, and potentially hundreds of millions of people
sending 50 VoIP packets a second, each with typically 20 bytes of
payload, it is highly desirable to avoid encapsulation in the
scalable routing solution for IPv6.
The IPv6 header has a 20 bit "Flow Label" which is currently not
used for any substantial purpose. Rather than encapsulate a packet
in order to tunnel it to an ETR address, FLOWv6 involves BGP routers
in the core forwarding each packet according to a route associated
with the value of the packet's Flow Label. As far as we know, the
Flow Label is not used in any checksums or cryptographic protocols,
so there should be no unwanted effects when its value is changed.
If we are prepared to contemplate the following, then FLOWv6 or some
similar approach should be considered as a solution for the IPv6
routing scaling problem:
1 - We would replace the current semantics of the Flow Label:
http://tools.ietf.org/html/rfc2460#appendix-A
with some new specification, applicable always in the
"core" - the BGP routers of IPv6's inter-domain routing
system - and probably generally applicable inside
conventional and new networks as well.
2 - The functions of the control plane and forwarding plane of
core routers would need to be modified - in relatively
minor ways. The control plane changes would surely be
feasible via firmware upgrades.
The forwarding plane changes could be quite minimal.
If we assume that all (or a sufficient proportion of)
core routers at the time of FLOWv6 introduction will use
CPU-based FIBs which can be enhanced with firmware
updates, then these changes are probably feasible.
The changes are to the way the router's RIB controls
the FIB, and how the FIB works. There is no change to
the BGP protocol, so the upgraded routers will work
fine alongside non-upgraded routers. It should be
possible to run FLOWv6 globally even when most, but not
necessarily all, IPv6 BGP routers are upgraded.
3 - We would need to be happy with an approximately 1 million
limit on the number of separate, BGP-managed, prefixes
in which the new kind of end-user networks could be
connected. This is not as restrictive as having no more
than a million ETRs in a map-encap system. It is a limit
on the number of separate provider network sites where one or
more EFRs (like ITRs) could be located.
Since the installed base of IPv6 core routers is relatively low and
since the Flow Label is not widely used, the above conditions seem
reasonable to us.
Changing BGP implementations and FIB functions is a non-trivial
step, however the result will be all core routers using a highly
regularised 20 bit field for forwarding, which is generally much
more attractive than the current situation in which an algorithm
such as Tree-Bitmap is laboriously executed some number of bits at a
time on the destination address of each packet. This typically
involves parallel CPUs and expensive DRAM lookups in order to step
through up to 48 bits of address, before the FEC (Forward
Equivalence Class) of the packet can be determined.
There may well be hash and cache approaches to avoid this effort on
every packet, but the use of the Flow Label will be simpler than
those too.
Because of this, we believe that FLOWv6 has the potential to greatly
ease the IPv6 FIB workload in the DFZ.
As a routing scaling solution, like map-encap schemes or Six/One
Router, the aim is to keep the BGP system handling primarily
prefixes for providers - with end-users having address space which
provides their needs in a manner which does not add to this number
of prefixes, or otherwise burden the control plane of the BGP
system. These needs include portability between ISPs, multihoming
and traffic engineering.
Detailed Description
====================
The following is a preliminary attempt at describing a proposal we
have just started to develop. We expect that there will be many
suggestions which will improve it. Please let us know your concerns
and suggestions. There may well be some gotchas or showstoppers we
haven't considered.
Terminology
-----------
Conventional networks
Provider and end-user networks whose address prefixes
are managed exactly as they are today - by advertising
each one in the global BGP system.
Conventional networks which have no IFRs (Ingress
Flow Routers) are known as "non-upgraded" networks.
SEN - Scalable End-user Network
A network for an end-user using the new SPI (defined
below) form of address space. In order to solve the
routing scaling problem, we need to have most or all
new end-user networks, and many, most or all existing
(conventional) end-user networks adopt SPI space.
Therefore we need to make this new form of address space
and type of network highly attractive to the great
majority of end-users, of all sizes - including for
instance corporations, universities, schools and large
hosting companies.
All SENs either have their own IFRs (defined below)
or are connected to the Net via one or more
conventional networks which provide IFRs to
handle their outgoing packets. So the term "non-
upgraded network" only applies to a conventional
network without IFRs - never to a SEN.
SPI - Scalable PI - address space
A new form of address space, intended solely for end-
user networks (all networks other than those of
Internet Service Providers) which is Provider
Independent, but in a manner which supports scalable
routing.
Conventional PI prefixes are each globally advertised
in the BGP system. The large number of these
prefixes, and their rate of change, is the cause of
the routing scaling problem.
SPI address space remains stable for each end-user
network, no matter which one or more ISPs they use
to connect to the Net. SPI space is therefore
entirely portable and can be used for multihoming.
Ideally SPI space can also be used for inbound
Traffic Engineering too. In the current Ivip-like
description of FLOWv6, inbound TE is achieved
indirectly within certain limits, rather than with
the explicit load balancing arrangements of the
other map-encap schemes. Nonetheless, both Ivip and
FLOWv6 enable very fine-grained control of mapping
with real-time user control - and this may result in
inbound TE which is superior to that possible with
the other non-real-time control map-encap schemes.
For IPv4, translation schemes are not suitable and
there is no 20 bit flow label in the IPv4 header -
so SPI IPv4 space is likely to be provided with a
map-encap scheme.
In the context of what follows, "SPI space" refers to
the new kind of address space provided by FLOWv6.
Micronet
A contiguous sequence of IP addresses - of the new
SPI type of address space - which are mapped to a
single "locator" address. In most map-encap schemes,
the micronet concept is implemented as an EID
(Endpoint IDentifier) prefix.
In Ivip and FLOWv6, a micronet is not necessarily a
binary-boundary prefix. In FLOWv6 it is an integer
number of contiguous /64 prefixes. The granularity
of the mapping system in FLOWv6 is one /64.
In the mapping system, a micronet is specified by
a starting address (64 bits) and a length, in
/64 steps. In principle, the length may be up
to 64 bits, however we may limit it to 32 bits.
UAB - User Address Block
This is a contiguous range of addresses which are
controlled by one end-user. An end-user may be
as large as a corporate or university network, or
simply an individual who has a mobile device,
such as a cellphone.
UABs are integer numbers of /64s, just like
micronets. They are specified by a 64 bit
starting address and a 64 bit length. IFRs,
EFRs (described below) and the mapping
distribution system do not use UABs. A UAB
is an administrative construct.
End users can divide their UAB into as many
micronets as they like, and each micronet
can be mapped to any 128 bit IP address -
the address of an EFR (Egress Flow Router),
which forwards the packet to the destination
SEN network. So a single UAB could be used to
create multiple micronets, and each micronet
could be mapped to a different EFR, in any ISP
in any country.
MAB - Mapped Address Block
A BGP advertised prefix in which the enclosed address
space is managed by the scalable routing system: for
instance map-encap or in this case FLOWv6.
While technically a single MAB could provide space
for just one SEN, this would help little - or not at
all - with the routing scaling problem.
Generally, each MAB should be relatively large compared
to the size of micronets. (That said, some SENs may
need only a single micronet of /64, and others may require
many more, much larger, micronets - so there isn't a
typical size of micronet.)
Generally, each MAB should include a large number of
micronets, such as hundreds or millions of them. This
will enable the micronets serving the needs of very large
numbers of SEN end-user networks to be handled from an
area of the address space which requires a single BGP
advertisement.
IFR - Ingress Flow Router.
Similar in function to a map-encap ITR (Ingress
Tunnel Router).
The most obvious location for IFR function to be
implemented is at the border routers - BGP routers
at the borders of all conventional networks which
SENs use to connect to the Net.
The IFR function processes all packets whose
destination address falls within a micronet, to
set the Flow Label bits to a value which uniquely
identifies the BGP advertised prefix towards which
this packet should be forwarded by all BGP routers
in the inter-domain core. (Full explanation below.)
IFRs can be servers or dedicated routers. They
can be located inside conventional networks - they
need not be a BGP router at the border of a
conventional network.
IFRs can also be located inside SEN networks and
are likely to be found at the border of an SEN
network and the one or more conventional provider
networks which the SEN networks uses to connect to the Net.
The third place an IFR function can be found is,
in effect, in the DFZ - where it is known as an OIFRD.
IFRC - IFR with Cache
IFRs are typically caching IFRs: IFRCs. They cache
the mapping information they currently require,
and do not attempt to store a copy of the entire
mapping database.
IFRH - IFR function in sending Host
A caching IFR function can also be built into a
sending host. This could be a zero cost approach
reducing or eliminating the need for separate IFRs.
All caching IFRs need an address which be reached
from anywhere, so they can receive mapping updates
from query servers. Like IFRCs, IFRDs can be on
conventional addresses or SPI addresses - but not
behind NAT.
OIFRD - Open Ingress Flow Router in the DFZ.
These are the the FLOWv6 equivalent of Ivip's OITRDs
- which do much the same job as LISP's PTRs (Proxy
Tunnel Routers).
OIFRDs are distributed around the Net, conceptually
"in the DFZ" to attract and process packets sent to
micronet addresses by hosts in "non-upgraded"
conventional networks: those which have no IFRs of
their own.
In fact, OIFRDs are within or at the border of some
conventional AS network. Likely locations are
Internet exchanges, peering points etc.
They are ideally close to non-upgraded networks, so the
total path travelled by the packet from its source,
through the OIFRD and to the EFR is not much longer, or
the same distance, as the most direct path from the
sending host to the EFR which serves the destination host.
Ideally, in the future, all conventional IPv6
networks will have their own IFRs and OIFRDs will not
be needed.
An OIFRD advertises one or more MABs and so attracts
packets sent from nearby non-upgraded networks.
It then does what all IFRs do: use mapping
information to set the Flow Label of the packet so
that (upgraded) BGP routers will forward the packet
towards the EFR to which this micronet is currently mapped.
The business case for OIFRDs is identical to that for
Ivip OITRDs:
http://psg.com/lists/rrg/2008/msg02021.html
EFR - Egress Flow Router
This is analogous to the Egress Tunnel Router (ETR)
in a map-encap scheme.
FLOWv6 uses the Flow Label so the core BGP routers -
and perhaps internal routers between the BGP border
router and the EFR - will forward the packet to this
EFR. This is somewhat more elaborate and flexible
than in the map-encap system and is described more
fully below.
There is no encapsulating header to remove (map-
encap), and no addresses to rewrite (Six/One Router).
The EFR will probably zero the Flow Label. The most
important function it performs is that it recognises
from the destination address which SEN network the
packet should be forwarded to. (The destination
address has been ignored by BGP routers, due to them
using the Flow Label to decide forwarding.)
When the IFR uses the packet's destination address
to look up the mapping information for the micronet
which covers that address, the return value is the
exact 128 bit address of the EFR. Below we explain how
this enables the IFR to set the Flow Label so that all
core BGP routers will forward the packet towards the
one or more BGP border routers which advertise the
prefix in which the EFR is located.
FPER - Flow Path Exit Router
This is fully described below. It is a BGP router
at the border of a network which houses one or more
EFRs. So this FPER router advertises one or more EPB
prefixes (next item).
This router performs a mapping lookup on the
destination address and ensures the packet is
forwarded to the correct EFR. The Flow Label caused
the packet to be forwarded to this router, across the
core, but the Flow Label alone cannot determine which
of two or more EFRs in this network the packet
should be forwarded to.
EBP - EFR Block Prefix
As described more thoroughly below, the IPv6 address
space is administered to create a regular series of
prefixes, each of which can be advertised in BGP.
There are 2^20 such prefixes: 1,048,576. Each has
the same length, say /32. /48 would probably be fine
too.
A conventional provider network which has one or more
EFRs and a single "site" (such as a network in a city,
or a data centre) needs one EBP. If it has multiple
such sites and does not want to ferry traffic between
them which addressed to EFRs, then it needs a separate
EBP for each such site.
All EFRs are located on addresses within one of these
EBPs. Below we discuss administrative arrangements
for this limited resource of about a million EBPs.
It may not be necessary to use the full number at any
time in the future. Perhaps a few tens of thousands
will be all that is required for the foreseeable
future, assuming IPv6 is widely adopted.
Mapping system
This is a system by which end-users can issue
commands which change their micronets' starting
points and addresses and by which they can change
the 128 bit EFR address to which each micronet is
mapped.
The primary task of the mapping system is to enable
the IFRs to rapidly and reliably find out what EFR
address any incoming packet should be forwarded to.
An "incoming packet" in this context means a packet
the IFR has received and identified as having a
destination address may be within a micronet -
by virtue of the fact that the address is within
one of the MABs. (In the example below, this is
easy to determine, since all the MABs - and no
other types of BGP advertised prefix - are in
4::/3.)
Various mapping systems could be used, such as the
pure pull systems of LISP-ALT and TRRP, the pure
push system of LISP-NERD, or the hybrid push-pull
systems of APT (slow) or Ivip (fast).
The following discussion assumes the use of Ivip's
mapping system, as described for that proposal.
This is a fast-push global system of Replicators
which conveys end-user mapping commands to full-
database Query Servers located in conventional
networks which have ITRs (IFRs for FLOWv6).
These local query servers quickly and reliably
provide responses to mapping queries from IFRs
in that network, or from nearby networks, including
from SEN networks which use this conventional network
for access to the Net. This avoids the delay and
reliability problems of the global query server
approaches LISP-APT and TRRP, while not requiring
every ITR/IFR to carry a copy of the full mapping
database (LISP-NERD).
The full database query server issues map reply
messages securely to the querying IFR, with a
caching time, such as 10 minutes. During that
time, the ITR is assumed to be Flow labelling
(encapsulating and tunneling for Ivip) packets
which are addressed to this micronet.
If the query server is told by the mapping system
of changed mapping for this micronet (or that the
micronet has been deleted) then it needs to send a
Cache Update command (AKA "Notify" command to that
IFR.
This hybrid push-pull system ensures all ITRs which
need the mapping information get it within a few
seconds of the end-user issuing the mapping change
command.
Please refer to the overall Ivip Summary and Analysis
documentation and the Ivip Fast Push Internet Draft
for further details:
http://www.firstpr.com.au/ip/ivip/
QSD - Query Server with full Database
QSDs get the full continual feed of mapping updates
from the fast push mapping system.
They handle queries from nearby IFRs - IFRCs and
IFRHs. (An IFRD is really a caching IFRC with an
integrated QSD, or an IFRC using a QSD in the same,
rack connected directly by Ethernet.)
QSC - Query Server with Cache
These can optionally be deployed, so there may be one
or more layers of QSCs between IFRCs/IFRHs and the
nearest one or several QSDs.
When a QSC has no cached information which answers
a query, it pass the query upwards to (or towards,
via one or more QSCs) the nearest local QSD.
When the QSC receives the response, it caches it and
sends the response downwards to (or towards, via one
or more QSCs) the IFRC/IFRH which made the request.
Likewise, when a QSC gets a Cache Update message from
a QSD above it (perhaps via one or more QSCs), it
passes it downwards to whatever IFRCs, IFRHs or QSCs
below it which, in the last 10 minutes (for instance)
queried the mapping for this micronet.
Tutorial by way of example
==========================
For simplicity, we assume that all core IPv6 routers have been
upgraded for FLOWv6. In a section below we discuss transition
arrangements while not all routers have FLOWv6 upgrades.
We will also ignore OIFRDs in this explanation - the IFRs which
collect and Flow Label packets sent from non-upgraded networks and
which are addressed to micronet addresses.
In this example, EBPs are /32s and a prefix E00::/12 has been
reserved for them. Consequently, the first few EBPs are:
EBP-0 E000:0000::/32
EBP-1 E000:0001::/32
EBP-2 E000:0002::/32
EBP-3 E000:0003::/32
and the highest is:
EBP-1048575 E00F:FFFF::/32
So far, 8191 have been allocated, in principle only to operators of
provider networks. EBPs are only needed by a network which hosts
EFRs, and any such network is inherently providing Internet access.
Technically, the EBPs could be allocated to operators of
conventional end-user networks, but then those would not truly be
end-user networks any more.
EBP-0 is reserved. (The final design may reserve more low numbered
or high-numbered EBPs for other purposes.) In our example, the
allocated EBPs include:
EBP-0001 E000:0001::/32 ISP-A (has only one "site")
EBP-0002 E000:0002::/32 } ISP-B (has 30 "sites")
EBP-0003 E000:0003::/32 }
... }
EBP-0031 E000:001F::/32 }
EBP-0032 E000:0020::/32 } ISP-C (has two "sites")
EBP-0033 E000:0021::/32 }
It is not desirable to have a million EBPs, since each is advertised
in BGP and so places a burden on the entire core routing system.
EBPs are only allocated to organisations which need them, and pay
for them. (At a later date we will develop plans for administering
these EBPs and for the commercial and regulatory aspects of FLOWv6.)
The ISPs generally have other "conventional" prefixes, outside this
special EBP set - as they do today. The ISPs use these
"conventional" prefixes for their own internal purposes, and for
some of their customers. Those customers use the space in today's
"PA" manner. Whether they get a single IP address or a prefix, and
whether they get it for a short dial-up or mobile session, of for
some period of years, the space they get is only available as long
as they use this ISP. It is "PA" - Provider Assigned - space and
therefore not portable to other ISPs.
These conventional prefixes and their PA usage has nothing to do
with the SPI space provided by FLOWv6.
We will consider two end-user networks with SPI space: Net-X and
Net-Y. For simplicity of explanation, these micronets are from the
same MAB.
In our example, the prefix 4::/3 has been reserved for MAB prefixes.
It is not absolutely necessary for all MABs to be in any reserved
prefix such as this, but it would simplify the functionality of IFRs
and EFRs.
In IPv4, for a map-encap system, there is no chance of making all
the MABs appear in some clearly defined subset of the whole address
space - since, over the next five to ten years, there needs to be
progressive conversion of a great deal of the whole address space
into MABs.
In IPv6, by administrative fiat, it would be easy for the IANA to
carve out two special prefixes which would make the FLOWv6 system
simpler to implement. In addition to the above-mentioned E00::/12
reservation for 2^20 EBP prefixes, in our example, the IANA reserves
1/8 of the entire IPv6 address space for MABs: 4::/3 .
(See http://www.iana.org/assignments/ipv6-address-space
for current assignments.)
Some company D - probably, but not necessarily an ISP or an RIR -
has been assigned the MAB:
4000:0050::/24
There could be 2 million MABs of this size in the 4::/3 reservation.
MABs don't necessarily need to be of the same size, or have no gaps
between them. It probably makes sense to standardise the size of
all MABs to be all the same - a simplifying convenience which can't
be done in the crowded IPv4 space.
We don't want tens of millions of MABs. Ideally, we probably want a
few dozen or at most a few hundred. Each MAB will have its own
stream of mapping updates. Each OIFRD will advertise one or more -
or potentially all - active MABs.
D rents some of this MAB's space to Net-X and Net-Y. This rental is
effectively permanent. Unless D goes broke (in which case the space
would be taken over by another company such as D and probably
administered to preserve the previous assignments), X and Y can have
their space for as long as they like.
Both Net-X and Net-Y pay D for their space, such as a certain fee
per year for each /64. They also pay D for the mapping changes they
make. This would probably be a charge per update, or some flat fee
for a certain number of updates per month.
In this fast-push mapping distribution system, it is important that
end-users pay for the updates they send on the system. The fee may
be as low as a few cents per update. These fees help pay for most
of the fast-push system, especially the Launch servers and
Replicators. This occurs through company D and others like it, who
directly or indirectly pay for the operation of the fast-push system.
The fee per update also discourages "excessive" use - such as
changing the mapping ever few seconds for months on end - to
implement fancy TE, or just to create annoyance. Each mapping
change involves a small amount of computation, storage and
communications bandwidth in the entire fast-push system and in all
recipient QSDs.
The cost will be very low, and it should still be low enough that
end-users with busy networks will find it attractive to use frequent
mapping changes to fine-tune the inbound TE of their multiple links.
The space of a network would be split into separate micronets, each
with some recipient hosts. By dynamically changing the EFR each
micronet is mapped to, the incoming traffic volume can be managed in
real-time and directed as desired to each of the two or more EFRs
and so via each of the two or more links from the two or more ISPs.
Net-X and Net-Y also pay D for D's operation of a global network of
OIFRDs which handle packets addressed to the above-mentioned MAB,
sent by hosts in non-upgraded networks. This means that Net-X and
Net-Y will probably pay according to traffic flowing through the
OIFRDs which was addressed to each end-user's micronets.
This is because one SPI end-user network might have only a small
amount of space, perhaps just a single micronet of /64, but could
run a very popular web site on it, and so generate far more OIFRD
traffic than another end-user network, which has much more space.
D would have a sampling system to estimate OIFRD traffic, it would
not make sense to count every byte.
In the following examples, ordinary IPv6 prefix notation will be
used to show the base address and length of each micronet, but in
practice the micronets can start and end at any /64 boundary.
Net-X has the micronet:
4000:0050:7000::/48
This is 65,536 contiguous /64s:
4000:0050:7000::
to 4000:0050:7000:FFFF:FFFF:FFFF:FFFF:FFFF.
This sounds like quite a large micronet, but it is technically valid
and perhaps there will be call for such micronets.
Net-Y's micronet has just two /64s:
4000:0050:9999:6666::/63
Micronets and UABs can range from a single /64, in principle to as
many /64s as fit in the MAB. In this case, the /24 MAB covers 1.02
trillion /64s.
Before depicting the passage of a packet through the FLOWv6 system,
we will describe the function of the EBP prefixes.
While an ISP could use space within an EBP prefix for any purpose,
here we assume that all ISPs use these prefixes solely for EFRs.
Our example involves two EBP prefixes:
EBP-0001 E000:0001::/32 ISP-A
EBP-0003 E000:0003::/32 ISP-B's "Site-2".
ISP-A advertises its EBP-0001 from a single border router.
ISP-B advertises its EBP-0003 from two border routers at its second
site.
The BGP system treats these EBP prefixes exactly the same as
ordinary BGP prefixes. All BGP routers therefore develop and
maintain best paths for both these prefixes, and likewise for all
the other EBP prefixes.
The enhanced BGP RIB functionality specifically recognises this set
of 8191 or whatever EBP prefixes, due to the fact they are within
the IANA defined prefix of E00::/12.
The new RIB function is programmed to detect each such /32 EBP
prefix, and to copy its FEC value (the internally value by which the
router's FIB knows which interface to forward the packet from) to a
special array in the FIB. This is the FINDEX[] array.
FINDEX[] is indexed 0 to (2^20 - 1).
Each element in FINDEX[] stores a FEC value, copied straight from
the FEC of the corresponding EBP in the RIB.
So in a given core router, if the BGP RIB has decided that the best
path towards ISP-A's EBP-0001 is "Interface 3", then the FEC value
which represents "Interface 3" is copied to the location 1 in FINDEX[].
With FLOWv6, it is required that all packets being handled in the
BGP core have their Flow Label set according to the following rules:
Set to 0 if the packet has not had its Flow Label set to a
particular value by any IFR.
Any non-zero value is assumed by all core routers (we
assume in this example they are all upgraded to FLOWv6
functionality) to represent the fact that this packet's
destination address is for a micronet which is currently
mapped to some EFR whose address is within a particular
EBP - where this EBP is directly specified by the value
of the Flow Label.
Having set the stage, we now provide an example packet flow, a
packet sent by a host HA to another host HB.
HA is on a conventional address in some ISP's BGP advertised prefix,
or in a conventional PI space end-user network.
HB is in Net-X's /48 micronet mentioned above:
4000:0050:7000::/48
HB's address is 4000:0050:7000:1234::33.
Net-X is currently using ISP-B's second site for Internet access,
and the address of the EFR incoming packets should be forwarded to
(via FLOWv6's Flow Label direct forwarding system, described below) is:
E000:0003:0000:0055::7
The packet is sent by HA and forwarded by its network's internal
routing system towards a border router, which also has IFR
functions. The IFR function recognises it as being addressed to
somewhere in the SPI (Scalable PI) address space, since all such
space is defined to be within a micronet - and since all micronets
are within MABs and all MABs within the prefix 4::/3, as is this
packet's destination address.
In our example IFR has no cached mapping information for this
address. A subsequent packet from HA to HB will have a less complex
process, due to the presence of cached mapping data in the IFR's FIB.
When the packet is analysed by the FIB, the result is of the form:
This packet is addressed to a section of the address space
which is known to be covered by the FLOWv6 scheme, but the
FIB currently has no mapping information for this particular
address.
Therefore, hold the packet and query the routing processor
to ask for the mapping information. Later, when this
arrives, the packet will have its Flow Label set and then
will be forwarded to a BGP router in the core.
Subsequent packets matching the micronet which was
specified in the mapping reply will be handled by a
faster, FIB-only, process which sets the Flow Label
to the same value, and again forwards the packet to the
core.
This is one of four initial responses the FIB could produce. The
other 3 are listed here:
http://psg.com/lists/rrg/2008/msg02029.html
Briefly, they are:
Send the packet conventionally, with a normal FIB lookup
of its destination address
Use cached mapping information for this packet's
destination prefix to set the Flow Label, as above,
before forwarding the packet to the core.
Drop the packet or process it via via some slower
and more arduous mechanism - which is not needed for
FLOWv6.
Once in the core the packet is handled by one or more upgraded BGP
routers.
In our example, the IFR requests mapping information for the
packet's destination address:
4000:0050:7000:1234::33
Actually, since the mapping system's granularity is /64, the map
request is for the 64 bit value, in hex:
4000 0050 7000 1234
Within a few tens of milliseconds, the response from the local QSD
(full database query server) comes back to the effect:
[ The queried address is within the micronet:
[
[ 4000:0050:7000::/48
[
[ which is currently mapped to the EFR at:
[
[ E000:0003:0000:0055::7
[
[ Cache this response for 600 seconds.
The FLOWv6 section of the IFR's RIB caches this information, and
processes it into a form to be sent to the FIB:
{ Any incoming packet matching:
{
{ 4000:0050:7000::/48
{
{ should have its Flow Label set to:
{
{ (hex) 0 0003
{
{ and should then be handled by the
{ usual forwarding mechanism.
The RIB sends this to the FIB, and by one means or another the FIB
matches the stored packet to this new rule. (600 seconds later, the
RIB will tell the FIB to delete the above rule.)
Now the packet has its flow label set to (hex) 0 0003 and the FIB's
forwarding mechanism (enhanced to do this FLOWv6 additional
function) looks a its Flow Label, discovers it is 3, and uses this
to index into the array FINDEX[].
This produces the correct FEC value for this packet - the number
which will cause it to be sent out the interface which leads to the
BGP router which is the best path towards the prefix in which the
EFR is located.
Once it reaches that router, the same process happens:
Is the Flow Label != 0?
Yes: use it to index into FINDEX[] to retrieve FEC.
Forward according to this FEC value.
This process is repeated for as many DFZ routers which the packet is
forwarded to, until it reaches a BGP router at the border of the
provider network in which the EFR is located.
This will be very much faster and simpler than the usual process of
analysing up to 48 bits in the destination address with the
Tree-Bitmap algorithm.
In this way, as long as the packet is handled by an upgraded BGP
router, it will be forwarded towards one of the border routers of
ISP-B's Site-2.
Note that the packet does *not* contain any address which refers to
the prefix advertised by ISP-B's Site-2:
EBP-0003 E000:0003::/32
The Flow Label was set just once by the IFR in the source site.
Once set, the packet is easily handled by (upgraded) BGP routers.
When the packet reaches the border router for ISP-B's Site-2, that
router performs a somewhat different operation, because its FEC
value in its FINDEX[3] selects an interface which does not point to
any BGP router in the core. This FEC value leads to some internal
router.
Because of three things:
1 - The next hop for this packet is internal, rather than
to a core BGP router
2 - This border router is a FPER (Flow Path Exit Router).
3 - The packet has a non-zero Flow Label.
this FPER router now performs a special operation. There are two
forms, depending on the local conditions.
The first form is if this Site-2 has more than one EFR, and where
the packet must be forwarded to the correct EFR:
The FPER router sets the Flow Label to 0. (Or perhaps to
some other value which has a useful meaning inside its
internal routing system - more on this below.)
The FPER router performs a mapping lookup on the destination
address, just like the IFR did. The FIB needs to do this
itself, not involving the RIB, unless the mapping has not
already been cached in the FIB.
(In most cases, this FIB will already have the mapping
information cached, since the router will have been
continually receiving packets for micronets which
have been mapped to EFRs at this site. So this will
typically not involve any delay, communication
activity or RIB, router processor etc. activity.)
This mapping lookup produces the address of the ETR to
which the micronet is mapped. (The micronet which encloses
the packet's destination address.)
Now the FPER router needs to forward the packet to that
EFR. Perhaps the EFR is in fact this FPER router.
The second form is if there is only one EFR at this site, or if
there are multiple EFRs and all of them can handle the packets for
all the micronets which are mapped to any EFR address in this site.
In this case, there is no need for a mapping lookup or
cached mapping information. The packet is forwarded to
the one EFR - or to one of the many EFRs.
It is a private matter in the provider network how the FPER gets the
packet to the appropriate EFR. One approach might be to reserve a
number of the 2^20 possible Flow Label values to have significance
only outside the core: inside provider networks. Then, a system
similar to that just described can be used by internal routers to
change the Flow label to some value which identifies a particular
EFR in the site. This way, the packet could be transported from HA
to HB, entirely via the use of the Flow Label, without rewriting any
other part of the packet, and without tunneling, encapsulation etc.
Transition: non-upgraded networks
---------------------------------
The task of this transition arrangement is to ensure that packets
sent by hosts in networks without IFRs are all forwarded to an
OIFRD, where they can have their mapping looked up and their Flow
Label set appropriately. The same principles which apply to Ivip
OITRDs apply also to FLOWv6 OIFRDs:
They should be distributed widely around the Net.
They should be able to handle peak packet rates without
unreasonable losses.
Their locations should try to minimise the packet taking an
overall longest path than it would without FLOWv6.
They will be paid for by the organisations who rent micronet
space to end-users
Transition: non-upgraded core routers
-------------------------------------
FLOWv6 is only going to be useful once a substantial number,
probably a majority, of DFZ routers have the FLOWv6 upgrades. This
is a significant hurdle for deployment, although perhaps tunneling
could be used initially when only a few DFZ routers are upgraded.
There needs to be a way the system can work reliably even when some
percentage of routers are not upgraded - such as 20% or less.
The most important thing to ensure is that each upgraded BGP router,
including the border routers, never forwards to any non-upgraded
router a packet which has its Flow Label set. The non-upgraded
router is likely to ignore the Flow Label, and do a standard BGP FIB
operation on the destination address.
This undesirable situation would result in the packet being
forwarded towards the nearest OIFRD which is advertising the MAB
which encloses the destination address. There, according to the
above algorithms, the packet will have its Flow Label set again to
the same value it already has, and that Flow Label will be used to
forward it to a router which should take it towards the network
which has the EFR.
The packet could easily get into a loop and so be dropped, as its
hop count reaches zero.
Folks with with BGP expertise will probably be able to suggest
better arrangements than this, but some possible techniques to
protect against this include:
Manually configure every upgraded BGP router not to accept
routes matching E00::/12 from neighbours which are not
upgraded.
and/or:
Manually configure every non-upgraded BGP router not to
accept (and therefore not to offer) any routes to any
neighbours if they match E00::/12.
There may still be problems with not enough upgraded routers in a
particular part of the core to handle the Flow Label forwarding of
packets.
Perhaps manually configured tunnels to some other nearby routers
would be a solution, but this raises various problems, including to
do with packet length.
PMTUD
-----
This proposal is at a very early stage of development, but it is
possible that there are no PMTUD problems with this approach.
The fact that packets do not get any longer is a major benefit
compared with map-encap systems. Solving those problems, including
making the best use of jumboframe paths in the DFZ, is quite
challenging:
http://www.firstpr.com.au/ip/ivip/pmtud-frag/
Assuming the TTL value is still decremented every time the packet is
handled by a router, Traceroute should still work fine through the
entire path, including the section where forwarding is controlled by
the Flow Label.
At any router in this part of the path, if the packet is too long
for the next-hop MTU, the router should be able to send a Packet Too
Big (PTB) message to the sending host. This is a major advantage
over map-encap schemes, where the source address may be that of the
ITR (not with Ivip, which uses the sending host's address) and where
the too-big packet is longer than and different from the packet sent
by the sending host - resulting in any PTB message not being
recognised by the sending host.
Any translation scheme (Six/One Router is the only one so far) would
have serious difficulties with PMTUD in the translated part of the
path, since the packet has different addresses to those it had when
it left the sending host. So even if the PTB was somehow sent back
to that host, a properly implemented PMTUD system on that host would
fail to recognise the PTB as relating to any packet this host sent.
TTR Mobility
------------
Any map-encap scheme, and Ivip in particular, can be adapted to
support a global mobility scheme with highly attractive
characteristics. A paper on this will appear soon. For now, the
descriptive material at:
http://www.firstpr.com.au/ip/ivip/#mobile
describes the Translating Tunnel Router approach to extending a
map-encap scheme for mobility.
It is not necessary to change the mapping every time the mobile node
gets a new care-of address. Typically a mapping change, to select a
new TTR, is only required when the care-of-address moves more than
about 1000km or so from wherever the current TTR is.
The TTR principles should apply in general to a system such as
FLOWv6. Instead of tunneling packets across the DFZ to the ETR-like
TTR, they would be forwarded according to the Flow Label.
However, the Flow Label approach won't work taking packets to and
from the mobile node and the TTR. So tunneling should be used for
this, as described in the above-mentioned material.
This raises some PMTUD problems. Fortunately, the TTR <--> MN
tunnel technology is not related at all to the map-encap scheme or
to the FLOWv6 system, and can be negotiated at set-up time between
the TTR and MN. This means that there does not need to be a single
fixed technology for this tunneling, enabling a variety of
techniques, innovation, and more localised potential solutions to
PMTUD.
Typically, those tunnels will be two-way and use the same techniques
as encrypted VPNs. These two-way tunnels are a lot easier to handle
PMTUD over than the situation in a map-encap system, where an ITR
has to get packets to an ETR which it has had no prior contact with
and with which it cannot reasonably engage in extensive communications.
--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg