Robin,
Be assured that TARA can be done such that each single next hop, no matter
whether intra- or inter-domain, is a matter of just one single table
lookup. 75600 table entries are sufficient to cope with a density where
two routers are away from each other by a half yard ( IMO 72 000 should be
sufficient too).
I have never heard a convincing argument why intra- and inter-domain
routing have to be such orthogonal as is the case today.
TARA's routing table could be derived from viewing a topology where each
router is immediately surrounded by a strict network followed by loose and
looser networks according to the geographical remoteness,
PLUS (!!!) an extension of the strict network which is the entire
intra-domain network without geographical limitations.
I would be grateful if you could provide an estimation how much faster a
packet may be forwarded via an average (intra-and inter-domain) path by means of
a single table-offset rather than a binary search across a classical FIB at each
router.
Heiner
In einer eMail vom 30.09.2008 09:47:15 Westeuropäische Normalzeit schreibt
rw@firstpr.com.au:
Short
version: To what extent do currently proposed solutions
for the DFZ
routing scaling problem help with
scaling problems in large internal networks?
Not at all, as far as I
can tell.
I guess the scaling problems of internal networks
are of quite a different nature to
the problems
faced by BGP routers in the DFZ.
I explore how Ivip's encapsulation and
two
Forwarding approaches could be used to help
with internal routing systems, in addition
to or
independently of their use in the DFZ.
The Forwarding approaches have no overhead
- which is a
significant problem for IPv6 map-encap,
especially for VoIP packets. They are
100%
efficient and involve no PMTUD problems.
In "Re: 2 billion IP
cellphones in 2103 & mass adoption of IPv6 by current IPv4 users"
http://psg.com/lists/rrg/2008/msg02594.html Wesley George wrote about the
scaling problem in internal networks, wondering whether the solution we are
seeking for the DFZ scaling problem will help with this.
Referring
initially to the scaling problem in the IPv6 DFZ developing later than the
scaling problem of internal networks of 3G cellphone operators, he
wrote:
> Admittedly the problem may be further off in the DFZ.
However, I > don't know why we would design something that only applies
to the > DFZ, since the route scale problem has potential to be much
worse > within a given network than outside it.
I wonder how
many ISP and end-user networks have more internal routes than the DFZ's
260k?
I understand that internal routing systems use OSPF or IS-IS,
which operate on completely different principles to BGP, which is
entirely decentralised.
The scaling properties of these internal
systems are presumably very different from BGP. I think it is true to
say that BGP suits the interdomain core better than OSPF or IS-IS, because
BGP can work fine in a system with no central coordination, even when the
size of the network is not known, while the other two assume a
centrally administrated and carefully managed network?
I understand
that in terms of the forwarding (FIB) part of the router, every extra route
is another load on the system like any other. I understand that the
FIB doesn't distinguish between routes which come from BGP and those from
the IGP (OSPF or IS-IS). Despite the heroics necessary to classify
and forward packets at 1 and 10 GBps for millions of prefixes, I understand
this is not the primary problem with the routing scaling
problem.
The RIB control plane scaling problems affecting any one BGP
router include:
1 - Amount of RAM and CPU power required to
handle a given number of prefixes. This scales
directly with number of prefixes multiplied by the
number of neighbour routers, since the router needs to
conduct a separate conversation with each neighbour
about each prefix.
2 - Traffic requirements for the routing
protocol. I figure this is a relatively minor
concern.
3 - Difficulties with the CPU and RAM coping with large
floods of changed routes, such as when one nearby or
distant link or router goes down, or comes up,
affecting hundreds of thousands of
prefixes.
4 - Point 3 manifests in the router as excessive
delays in adjusting its best path to the changed
conditions.
As the number of prefixes rises, and as the rate of updates
rise, in order to keep to certain standards of correct and rapid
operation, each router needs to be upgraded at great expense, either
with more RAM or perhaps with a complete replacement. Alternatively,
any given router may need to be restricted to operating with
fewer neighbours than it would be if the number of prefixes had not
grown so much.
So these points add up to a major financial burden
for any network operating DFZ routers.
Also, the scaling problems
for the whole BGP network include:
5 - Slower propagation of
updates - slower response to outages and therefore
more packets going to black holes during a major
change to the topology of the network.
6 - Greater concern about
the stability and robustness of the whole network,
considering that no-one really understands it. No-one
even knows for sure the structure of the network or
how many DFZ and single-homed BGP routers there are.
How is the overall behavior affected by some routers passing
on changes much slower then others? It is hard to
estimate, but in general it can't be
good.
I understand that some networks have millions of internal
routes. I assume this is sustainable - so OSPF or IS-IS presumably
scales somewhat better than BGP. Part of this ability to handle
larger numbers of prefixes is probably due to the internal routing
system being more controlled than the DFZ. In the DFZ, there is no
control or influence on the rate of updates arriving from
neighbouring networks. Other than crude filtering to the point of
ignoring some of them, and potentially upsetting connectivity to some parts
of the Net, a DFZ router needs to respond to them all.
In these very
large internal networks, what are all these routes for? Is it all
internal stuff for the ISP/carrier? Does it carry a large number of
routes for PA customer networks? This is probably too big a
question to answer in the RRG. Can anyone point me to resources
concerning this?
I am not convinced that the internal network's scaling
problems are identical or even close to those of the DFZ. Even if
they were identical, I would argue that they are not so much of a concern
to us as the DFZ:
Firstly, it is a conscious decision by
administrators to make a network so big that it has a million or more
internal routes. No-one is forcing them to do this, or is saying it is a
good idea.
Only a small proportion of ISP and end-user networks have
such numbers of internal routes, and it is probably a fairly low
priority for the IETF's to reduce the costs of such large
organisations.
Secondly, since these internal networks are fully
managed by the organisation, including presumably the ability to reduce the
updates sent by any one router, the scaling problems and
stability difficulties can be controlled in this way, rather than by
changing protocols or spending more money on routers.
Since any
substantial ISP must deploy routers in the DFZ, the costs of doing so, and
the instability and long convergence time problems resulting from the
growing size of the DFZ routing table are major barriers to any ISP
operating. Therefore, the DFZ scaling problem has a pervasive impact
on the cost and quality of all Internet communications. The same goes
for any end-user network which wants or needs portability and
multihoming. This is a very high priority for the
IETF.
Even if we accept that the scaling problems of internal
networks are very different from those of BGP in the DFZ and even if we
decide it is not our concern if internal routing systems have
scaling problems, we might still want to consider how our proposed
routing scaling solution would help or otherwise with the internal
routing scaling problem. We need to convince ISPs and both large and
small to invest in changes to routing and addressing, this looks like
an important question:
> What incentive do I have > as an
operator to deploy some fantastic new thing for the DFZ if > I still
have to have routers that cost millions to deal with my > internal
network routing table?
In the APT business model, as described in a
recent message from Michael Meisel:
http://psg.com/lists/rrg/2008/msg02589.html
The decision to adopt APT
is taken by the ISP, for the ISP's own immediate and lasting benefit:
improved efficiency in some way. (I don't know how this is achieved,
and I don't understand how an ISP could do this without checking with the
end-user network whose space is being converted to an APT EID and so will
be withdrawn sooner or later from the DFZ.)
In the Ivip
model:
Re: Comparing APT & Ivip - new business
models http://psg.com/lists/rrg/2008/msg02593.html
ISPs are
not necessarily the primary driving force behind Ivip adoption. They
will be the most direct beneficiary of Ivip's effect of reducing or
eliminating the routing scaling problem - and their lower costs will be
passed onto all Internet users.
However, on a network-by-network basis,
the impetus for adopting Ivip-managed SPI address space, or for converting
an existing PI prefix to SPI space, will mainly come from the end-user
networks whose space this is. Existing PI end-user networks would
seek either better flexibility (many more micronets, including down
to single IPv4 addresses, with potentially fast and frequent
mapping changes to implement real-time load sharing) by converting
their prefix to SPI space. Alternatively, by relinquishing their PI
space and BGP expertise, and renting a probably smaller, and
therefore probably cheaper, amount of SPI space from a MAB operating
company, they would achieve improved flexibility and reduced
costs.
End user networks with PA space will be motivated to adopt Ivip
by the desire for portable, multihomeable, space - making
them independent of any one particular ISP. Their current ISP
is unlikely to push them to adopt Ivip, except in the hope of
keeping them as current customers, rather than them doing on their own,
or at the urging of a competing ISP.
The direct benefit of Ivip to
ISPs comes slowly, as the DFZ routing table either drops in size, or at
least doesn't grow as fast as it otherwise would have. ISPs may in
general want every end-user network to adopt Ivip, for this reason.
However, any PA end-user customer of theirs which adopts Ivip will be less
tied to this ISP than before, because they can now multihome their new SPI
space with another ISP, or leave the current ISP entirely.
So ISPs
might in the short term be unmotivated to deploy Ivip themselves, except as
required to meet the needs of their customers who want to use it. An
ISP's interests might be served well by letting all other ISPs and end-user
networks adopt Ivip, while itself doing nothing and keeping its current PA
customers. However, competition ensures that such a complacent
approach would lead to loss of customers.
> Assuming that
always-on IP-enabled applications continue taking > off, I have ~55M
handsets to address. Accepting in the short term > (5 yrs or so) there
will be some significant amount of IPv4-only > devices, as those age
out, the IPv6 table continues to grow in my > network. The DFZ may not
have to see much of that except in some > mobility cases (depends on
implementation), but you can't argue > with the idea that even with
well-built address hierarchies, some > routers in the network are going
to have to deal with orders of > magnitude more routes than they do
today. What better place to > test out a new scalable routing
infrastructure than in a > controllable network before it has to be
implemented by the DFZ > across multiple networks?
The RRG's
charter is purely to deal with the scaling problem in the DFZ.
As
currently presented, the core-edge separation schemes: LISP, APT, Ivip,
TRRP and Six/One Router are all intended to relieve pressure on the DFZ
core by enabling end-user networks to have a new kind of PI space, which I
call Scalable PI space, without each SPI prefix appearing in the
DFZ.
In a recent message:
Re: Comparing APT &
Ivip http://psg.com/lists/rrg/2008/msg02589.html
Michael
Meisel wrote of APT:
Below, you describe your doubts about
how a single ISP could deploy APT unilaterally, without the
involvement of their customers. Allowing for this, and giving
ISPs an incentive to do so, is perhaps *the* primary goal of
our incremental deployment scheme. As I mentioned before, we
should have a new document describing the updated details
sometime in the next few months. But, to summarize, you can
think of a single ISP deploying APT as similar (in concept) to
a single ISP deploying MPLS, or some other internal efficiency
improvement. The difference is, APT allows for a potential
increase in benefits with every other ISP that deploys
it.
As I wrote in that thread, I don't understand how APT could
be deployed by an ISP without coordinating with the end-user
network. However, Michael indicates that APT could improve efficiency
within the ISP network. If so, then perhaps it helps in some way with
the scaling problem of the internal network, perhaps in terms of
the number of BGP routes its internal BGP system carries.
Ivip,
as currently described, does not aim to help with the scaling problems of
large internal networks. Here I will explore how Ivip might be used
to do this.
While Ivip began as a map-encap scheme - like LISP, APT and
TRRP - the long term goal for Ivip is to use Forwarding, instead
of encapsulation:
ETR Address Forwarding (EAF) - for
IPv4
http://tools.ietf.org/html/draft-whittle-ivip4-etr-addr-forw-01
Prefix Label Forwarding (PLF) - for IPv6
http://www.firstpr.com.au/ip/ivip/ivip6/
These both have two major
advantages - no encapsulation overhead and no need for greater ITR and ETR
complexity with extra protocols and probing etc. to solve the Path MTU
Discovery (PMTUD) problems inherent in map-encap:
http://www.firstpr.com.au/ip/ivip/pmtud-frag/
These two Forwarding
schemes operate in different ways. Both make use of currently
un-used, or little-used, bits in the existing IPv4 and IPv6 header.
They both require upgraded FIB functions in DFZ routers, and to some extent
in internal routers. The PLF approach for IPv6 also requires a small
change to the RIB. Neither involves new routing protocols or any
change to the BGP (or internal routing protocol)
implementation.
Perhaps it will be easiest to implement these changes
in routers and then deploy Ivip purely on a forwarding basis.
Otherwise, we need to devise and introduce the more complex map-encap
approach - and upgrade progressively to Forwarding in the longer-term
future.
Here is how Ivip might be used to help with the scaling
problems of internal networks.
Firstly, considering IPv4 and IPv6
done purely with encapsulation: This requires no changes to DFZ or internal
routers, so can be deployed by adding ITRs, ETRs and a mapping
system.
I will consider the potential for helping with the scaling
problems of both large ISPs / telco-mobile carriers and big
end-user organisations, such as large universities, governments
and corporations. I will refer to these as Big networks.
In
the standard Ivip arrangement, the Big network has multiple full database
query servers (QSDs). These all receive a full, real-time, flow of
mapping updates from the global fast-push mapping system:
http://tools.ietf.org/html/draft-whittle-ivip-db-fast-push
ITRs in the
Big network send queries to these QSDs and get mapping replies very quickly
and reliably (like APT's ITRs and Default Mappers, and much faster and more
reliably than with LISP-ALT's or TRRP's global distribution of potentially
millions of query servers.
ETRs can be located on any conventionally
BGP managed address - not on SPI address. This means for a Big
end-user network to have its own internal ETRs, it must have some part of
its network running with either its own conventional PI address space, or
perhaps with some PA space it gets from one or more ISPs. For
instance, a multihomed-user network could have ETRs in its network, on
two separate pieces of address space, one from each of its two
upstream ISPs.
ITRs can be on any public address - conventional BGP
managed or SPI. They can't ordinarily be behind NAT, since they need to be
able to receive mapping updates from a nearby QSD for some prefix
they recently requested mapping for.
Here is how such a Big network
could use Ivip to reduce the number of prefixes in its internal routing
table. The Big network would establish its own internal mapping
system, to generate mapping for internal micronets, and to map them to any
of its internal ETRs. Actually, each internal micronet could be mapped to
any ETR in the world - its just that the system would only catch packets
sent to these micronets from within this Big network, or from within
any other network which also used this second set of internal
mapping information to drive its QSDs and ITRs.
The QSDs would all
be sent this internal mapping information in addition to the global feed,
as sent to all QSDs everywhere.
Caching ITRs (and any full database
ITRs, which are effectively a caching ITRs coupled directly to a QSD) would
then be able to encapsulate packets sent within the internal routing system
and tunnel them to whatever ETR was specified in the mapping.
With
an internal fast-push mapping system - without the need for multiple RUASes
or the Launch server system - this mapping could probably be pushed to all
internal QSDs in a fraction of a second. I don't know how responsive OSPF
or IS-IS is in large networks, but perhaps this internal Ivip mapping
system would be more responsive than these internal routing systems.
ITR behaviour could probably be changed in less than a
second.
Assuming that most routers remain as they are now, and that
ITRs are either hardware based routers (Cisco and Juniper style) or
specially programmed COTS hosts, then there needs to be a way to attract
raw packets to these ITRs when their destination address matches
the prefix of an Internal MAB (Mapped Address Block) of Big
network Ivip-managed space. (I probably need another term for this
other than SPI space.)
So we need the concept of an IMAB in addition
to the global MABs which the global Ivip system manages. These ITRs
would be an internal equivalent of the Open ITRs in the DFZ (OITRDs) -
maybe call them Open Internal ITRs (OIITRs) or Open ITRs in the
Internal network (OITRIs). These are all tongue-twisters . .
.
For internal purposes only, ITRs and ETRs could probably be
on private addresses too. Likewise, private address space could
be managed by this internal Ivip system.
In this way, the internal
routing system can handle a much smaller number of prefixes, since most of
the prefixes the internal routing system currently handles could be done
with the internal mapping system and internal ITRs and ETRs.
The
destination ETR for an internal micronet doesn't have to be an internal
ETR. It could be any ETR in the world. Maybe this could help
link the network to other networks. This is getting complex, but
these are optional complexities and this should be expected in any
flexible, useful, TCP/IP routing and addressing scheme.
The PMTUD
problems inherent in map-encap may be significantly reduced in this
internal application of Ivip, because the network administrators may be
able to ensure that all devices between the ITRs and the ETR have MTUs of
9000 bytes or so. That doesn't necessarily solve PMTU problems for an
application which thinks it can send packets of this length, when those
packets are encapsulated and become longer than 9000 bytes - and the
encapsulation makes a mess of a Packet Too Big message which is created by
a router in the ITR -> ETR path. However, there may be ways in an
internal network for handling this which are simpler than those required
in interdomain routing. For instance, every ITR could reject a
packet of length > (9000 minus encapsulation overhead), if it can
be assured that the MTU to the ETR which is known to be in the
Big network is exactly 9000. Upgrading everything to Gigabit
Ethernet and 9000 byte MTU would be quite a task, but over a few years
as part of the upgrade to an internal Ivip system, the whole
network could be significantly streamlined.
Now consider Ivip
operating with ETR Address Forwarding (EAF) for IPv4. The internal
routing system would need to have most or all its routers upgraded to
forward packets according to the 30 available bits in the IPv4 header when
the header is in the new format, signified by the "Evil bit" being set to
1.
Maybe some smaller, older, routers near the periphery of the
network are not upgraded in this way. So within the network there is
a "Forwarding Upgraded Zone for IPv4" (FUZv4) where Forwarding
based ITRs and ETRs can operate freely, without any PMTUD problems.
This means they have 1500 or 9000 byte or whatever PMTUs and
handle traffic packets of these lengths, with no encapsulation
overhead. PMTUD would operate normally with these routers, including
when these routers are between the ITR and the ETR - the sending
host's application adjusts the packet length to suit the total PMTU
between it and the destination host.
ITRs can have their packets
forwarded to any ETR, and the ETRs must be located on addresses such as
x.x.x.0, x.x.x.4, x.x.x.8 etc. due to the use of only 30 bits for a
forwarding address, rather than the ideal of 32 bits.
This would
probably work really well. There is no change to the RIB of the
internal routers. They still work with OSPF or IS-IS and their RIB
contains the same information. Its just that instead of forwarding
the packet based on its destination address, the new kind of packet (with
the Evil Bit set to 1) is forwarded according to the 30 bits in the header
which were previously used for fragmentation offset and
checksum.
There are some restrictions on sending fragmentable packets
which are longer than some limit -
see:
http://tools.ietf.org/html/draft-whittle-ivip4-etr-addr-forw-01#section-5
Since
the internal network could have different PMTU characteristics than those
assumed when setting the global MinCoreMTU value for the whole of the Ivip
system, perhaps internal micronets could be less restricted in terms of
fragmentable packets. However, overall, I think it is best to
discourage the use of fragmentable packets.
The IPv6 Forwarding
approach operates on different principles from the IPv4 approach.
With ETR Address Forwarding (EAF) for IPv4 there are 30 bits available -
enough to uniquely identify every ETR.
With Prefix Label
Forwarding (PLF) for IPv6, there are only 20 bits available. The
current plan is to use have half of these codepoints (524,287) used in the
global IPv6 Ivip system to identify the BGP prefix by which the ETR can be
reached. On arrival at the border router which advertises that
prefix, a second operation is required if there is more than one ETR in
that prefix. The mapping needs to be looked up again, for the
destination address of the packet, and the result of the mapping lookup
(the ETR's exact address) may result in the packet being sent to the ETR by
one of several methods:
1 - Forwarded over a direct link to the
ETR.
2 - Encapsulated to the ETR (but this raises the PMTUD
problems again, which Forwarding otherwise avoids).
3
- A similar approach to Forwarding across the DFZ, based on the 20
bits in the header, but this time using the other 524,287
code points, which internal routers in the ISP network map
to prefixes handled by the internal routing
system.
This is more complex than the IPv4 approach, but it is the best
we can do with IPv6's horrendously long 128 bit addresses for ETRs, when
we have only 20 bits to play with. While remaining
compatible with the existing IPv6 header size, this looks like the only way
of avoiding encapsulation and its PMTUD problems.
How could this
system be used to help with the scaling of internal networks?
As
with the IPv4 approach, the network would have its own internal mapping
system and its QSDs (and therefore its ITRs) would work from this mapping
information as well as from the global mapping feed.
The above system
provides 524,287 micronets which could replace existing internal prefixes,
and be mapped to any internal ETR.
Perhaps these internal micronets
could be remapped with lower costs and higher speeds than are possible with
these being prefixes in the internal routing system. If so, this
would be benefit in addition to removing this number of prefixes from the
internal routing system - though they do need to be covered by some much
smaller number of Internal Mapped Address Blocks.
If internal
routing systems are currently coping with more than a million internal
routes, and this is somehow not regarded as causing a serious scaling
problem, then it looks like this IPv6 approach to forwarding in Ivip is not
going to make such a difference, since it could at best reduce the load by
half a million prefixes. But if these prefixes were frequently
changing, and this was a serious burden for the current internal routing
system, perhaps it would be a worthwhile approach.
As with the IPv4
approach, all paths between ITRs and ETRs need to have routers upgraded to
handle the new header format. With a Big network, this gives rise to
a Forwarding Upgraded Zone (FUZv6) - the part or parts of the network where
all routers handle the new Forwarding format of the IPv6 header.
These upgrades are more complex than the upgrade for IPv4 forwarding.
Still, I guess they might be done with a software update for existing
routers. The IPv6 approach to Forwarding require some further
management compared to the more direct IPv4 approach.
For the
524,287 interdomain code-points in Prefix Label Forwarding (PLF) for IPv6,
I propose that there be a direct, globally assumed, mapping between each of
these code-points and a particular "Core Egress Prefix" all of which are in
a contiguous block. For instance, they would match:
CEP-0 E000:0000::/32 CEP-1
E000:0001::/32 CEP-2
E000:0002::/32 . . . CEP-524287
E007:FFFF::/32
For the internal 524,287 code-points, it is up to
the network administrators which code-point matches which prefix. So
all the routers would need to be configured in the same way. There
could be some range of codepoints assigned to one set of contiguous
prefixes in one part of the IPv6 address space, and other ranges
of code-points assigned to other parts.
So both the
encapsulation and the two Forwarding approaches to Ivip could be used to
help internal networks scale.
These techniques do not absolutely depend
on the development of a global Ivip system. One or more big ISPs or
end-user network operators could, in principle, talk to their router
vendor(s) and ask them to develop ITR, ETR and mapping systems to work just
within their networks. The Forwarding approaches would require
software updates to existing routers, but the ITR and ETR functions would
be simpler than for map-encap due to the lack of PMTUD problems. Router
vendors would tend to implement new features in their existing devices, but
it would also be possible to implement ITR, ETR and mapping distribution
functions in software for ordinary COTS hosts.
I suspect that the
scaling problems of internal routing systems are not yet so pressing as to
prompt a development like this, but at least there is some prospect for
synergies between using these techniques internally as well as for their
original purpose - to help solve the interdomain routing scaling
problem.
- Robin
-- to unsubscribe send a message
to rrg-request@psg.com with the word 'unsubscribe' in a single line as the
message text body. archive: <http://psg.com/lists/rrg/> &
ftp://psg.com/pub/lists/rrg
|