[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] Is 12 bytes really so scary?



Heiner, I am not responding to your suggestions because I am only
interested in proposals which are incrementally deployable.  You
discuss modifying TCP/IP, introducing a completely new routing
architecture etc.

I will be away from email for a few days - back on Wednesday.


Maybe the wariness about the practicality of making a real-time
controlled ITR-ETR scheme as been due assuming that the problem is
inherently "hard" - or at least that the solution is likely to be
complex and involve hundreds of thousands of routers talking to each
other, organising themselves autonomously, like the BGP system we
are trying to save.

One way of conceptualising the BGP problem is "insufficient
aggregation".  Some ITR-ETR schemes which involve new structures
which optimise "aggregation" include LISP x.x, CONS, ALT and EMACS.

These all involve the "lots of routers all talking to each other in
a complex system, with messages or at least packets being passed
back and forth, peer to peer, with each one either deciding the next
hop for the packet, or doing something intelligent with the message
before deciding how to pass it on" - not unlike the BPG system.


LISP NERD and Ivip are ITR-ETR schemes where aggregation is not a
concern - they just get the mapping data out to the ITRs.  NERD and
Ivip are conceptually simple, deterministic systems with a
centralised control system, by which actual control of mapping is in
the hands of end-users.

If you took NERD, replaced the HTTP retrieval system with a fast
tree-structured multicast-like fan-out system, added Query Servers
(as in APT's Default Mappers) and then added caching Query Servers
and ITR functions in hosts, you would wind up with a fast push
system resembling Ivip.  Then there wouldn't be a need for complex
mapping data with multiple options for ETRs or for ITRs to test
connectivity to ETRs - the end-users can change their mapping
manually or with their chosen monitoring system.


No-one really knows how big the Net will grow.  So if you fear a
fast push system can't be done, it is easy to justify this by
picking a very large number out of the air, citing this as the
number of micronets multiplied by the rate of change that the system
must ultimately scale to - and multiplying this by the presumed size
of an update message to arrive at a scary number.

I think we want to make something which happily handles micronets at
least a thousand times more numerous than the current 240k BGP
prefixes.  But that factor of a thousand growth will take time to
occur.  Even doubling the growth rate (currently about a doubling of
BPG prefixes ever 4 years), it would take 20 years for a factor of
1000 growth to occur.

I don't think we should adopt something unambitious like LISP (slow,
with complex ITR functions and no support for mobility) just because
 of the assumption that a fast push system won't be workable 20
years in the future.

I think a good solution would be one we are confident will work for
15 years or so - till 2027 - as long as we feel confident it could
be extended then, or that its costs then would be less of a concern
with the technology of the day.


I could imagine perhaps a few tens of millions of businesses
(including folks like us at our home offices) wanting to multihome -
but that is an outside figure and maybe over 20 to 30 years the true
figure would be one or two million.

I think the only foreseeable cause of massive growth in micronets
would be if it became common for every cell-phone (as these things
are currently called - audio-video player, PC etc. in the future) to
have its own micronet.  That definitely requires a pay for mapping
change arrangement, and the mapping would only change when switching
between networks - such as to a WiFi connection - not between 3G
base-stations.

I don't think we should reject a technical solution because we can't
imagine how it would work at some distant time in the future when we
assume it will need to handle billions of micronets.  The growth may
never be that great, and by then, we will be able to adapt the
system and handle it with cheaper silicon and bandwidth.


We are not trying to make a new routing system which delivers
packets to ISPs and adapts to changes and outages in how ISPs
connect to each other.  The BGP system already does that fine.

We are creating a special class of address space and piping it
around the place with a global system of ITRs around the edge of the
BGP system - and also in the middle of it to catch packets sent from
networks without ITRs.


The BGP system is very "labour intensive" (RAM, CPU and memory):

  123,000 DFZ routers (probably more - RRG messages 255, 257 & 262).

  240,000 BGP advertised prefixes, doubling every 4 years (faster
              as fresh IPv4 space runs out).

  3 peers average? (My wild guess.)

= 88 billion, divide by 2

= 44 billion two-way conversations between DFZ routers, each
comparing notes with its peer about which has the shortest path for
each prefix.

No wonder no-one can simulate the BGP control plane - even if they
could simulate the traffic, outages etc.  I guess there is no other
IT system with this number of interdependent real-time
communications going on at once.  One crude way of thinking about
the global BGP control plane is 123k+ nodes simultaneously playing
240k games of Life (with the single-homed routers generating changes
as well) - with time delays due to long distance fibre delays and
CPU limits at times of high activity.

A fast (few seconds) ITR-ETR scheme will be big global project, but
not messy like BGP, with high-powered devices fighting their way
through hundreds of thousands of decisions and communications which
necessarily provide a very blinkered view of the situation.

Ivip and any other fast push ITR-ETR scheme will not involve a
pulsating, potentially chaotic, interconnected network like BGP's
control plane.  It can start nice and small, and still provide
immediate benefits to users in the form of address space which they
can multihome and take to any ETR-equipped ISP (or run their own ETR
in their PA space).  It can also provide highly valuable, efficient,
mobility.

A fast push ITR-ETR system such as Ivip will be highly deterministic
and won't require complex modelling in order to predict its behavior.

The mapping data needs to come from various sources and be combined
into multiple identical streams of packets, with each packet signed.
 I have some ideas about doing this and hope to complete them in a
few months.  I can see a relatively straightforward way (no PKI
etc.) that each "replicator" can check the validity of the packet it
gets from its upstream replicator.  Each replicator should get feeds
of packets from two upstream replicators, so it will only rarely not
get a copy of any particular packet in the mapping stream.  It fans
each packet in the stream out to some larger number of replicators
at the next level - say 10 or 20 of them, maybe more.

This looks challenging, but practical.  The average data rates will
not be high by current standards, until perhaps some time in the
future - say 15 years - when they will be higher, but still not
excessive by the standards of the day.

There will be peaks of activity, for instance due to some outages,
but it is not like the whole pattern of "best path" shifting in the
DFZ due to outages, with changes rippling across the global BGP
control plane.  There would just be a bunch of micronets being
switched to another ETR.  This would only occur due to a hard
outages in one or more ISP networks.  Currently, BGP rearranges its
best paths when there is a change in transit connectivity.


There is no reason to believe that a fast push ITR-ETR system would
generate its own noise, amplify perturbations, create "hunting"
patterns etc. as BGP's control plane sometimes does.  (IDR list
early June.)

There is some demanding FIB stuff to do in ITRs, and some fancy work
to enable QSD query servers to securely and robustly send updates to
their queriers about micronets the querier recently asked about and
for which new mapping has just been received.

A robust, secure, mapping data replication system is challenging,
but will probably require just some imagination and good
engineering, rather then heroics.  Replicators will most likely be
Linux/BSD boxes.  Likewise it will be possible to use cheap PC
hardware to build good query servers and many ITRDs and ITRCs -
sitting on a single Ethernet cable next to existing routers.  In the
longer term, I think Big Iron routers will do a lot of the ITR work,
but servers can do quite a lot of it in smaller ISPs who don't want
to upgrade their routers immediately.

ETRs in a fast push scheme can be pretty simple (assuming they don't
need to filter decapsulated packets according to source address),
except for whatever PMTUD work they need to do with the ITR.  Apart
from that, they don't need to communicate with the ITR.  It might be
best to have a standardised method of testing an ETR is reachable,
so any multihoming monitoring system can use that.  Reachability of
end-user networks can tested through the ETRs, but without requiring
any new ETR functionality or without standardising how ETRs send
packets to end-user devices.


The PMTUD stuff will be tricky - but this has nothing to do with the
fast mapping replication.  Every ITR-ETR scheme needs to cope with
this in order that ITRs and ETRs can be placed in all the locations
we need them to ensure short paths and good load spreading.


Maybe people have been spooked by the tangled web of BGP.  While it
is beyond our power to seamlessly untangle that, our ITR-ETR system
throws a ring around the DFZ and handles vastly more end-user
networks in a clean fashion, without further burdening the DFZ
control plane.

Just because this is a router-based fix for a messy router problem
doesn't mean that the solution needs to be as tricky and have such a
resemblance to cellular automata as the BGP system we are trying to
keep ticking.


  - Robin

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg