[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RRG] Ivip business models: fast push & OITRDs

To: Routing Research Group <rrg@psg.com>
Subject: [RRG] Ivip business models: fast push & OITRDs
From: Robin Whittle <rw@firstpr.com.au>
Date: Fri, 18 Apr 2008 15:42:05 +1000
Cc: Michael R Meisel <meisel@cs.ucla.edu>
Organization: First Principles
User-agent: Thunderbird 2.0.0.12 (Windows/20080213)
In the thread "Not moving the problem to the global mapping system",
Michael Meisel wrote, in part:

> Ivip is unique among the proposals, as you mention, in the
> following way: all other proposals intend to guarantee
> either fewer pushed mapping updates compared to BGP, or
> pull-only updates /by design/.

I discuss this in a separate message in the above thread.

> It /might/ be true that the update cost in Ivip is manageable
> due to human or economic factors, but that is only speculation at
> this point, and not enforced by the design.

I think it is impossible to create any design which provably,
forever, sets limits on some thing getting out of hand in terms of
costs imposed on any particular participant - other than by making
the thing impossible.  Pure pull (LISP-ALT and TRRP) does that, but
with considerable costs:

  1 - Delayed or dropped initial packets when the ITR has to
      wait for mapping.

  2 - Impossibility of getting fresh mapping data out to ITRs
      unless there is an extremely costly and short caching time
      for the previous mapping replies.

Consequently, pure pull systems have additional complexity to try to
get the packet to the ETR quickly when the ITR has no mapping (over
the ALT network, or with TRRP Waypoint Routers).  Pure pull systems
require complex (compared to Ivip) mapping information carrying all
alternative ETRs and priorities multihoming service restoration, and
likewise requiring extra functionality in ITRs and ETRs to support
this.  Also, TE needs to be done via more complex mapping
information and ITR functionality than with Ivip's (admittedly
limited, but real-time changeable) TE capability.

The two "pure pull" systems (LISP-ALT and TRRP) also contemplate a
form of directed push  - ways of getting recently changed mapping
information to the ITRs which are tunneling traffic for the EID
prefix in question.  (See my previous message for details of this
suggestion for LISP-ALT.)  So I don't think that practical
realisations of these proposals would really be "pure pull".

Apart from the LISP-ALT prototype, all these map-encap systems are
vapourware at present.  Arguments and assurances about future
business models are going to be pretty hard to distinguish from
"speculation".


> So all I am saying, and I believe most people on this list would
> agree, is that we need some objective, scientific metric by which
> to judge whether a design is likely to solve the problems in
> question without causing new ones.

That would be good, but it is early days yet - so all I can provide
is plans and arguments.  That stuff is amenable to discussion in a
robust, scientific, manner - but it can't be tested in a scientific
experiment.


> Something where we can do a quantitative evaluation.

I think this is overly ambitious at this stage where the map-encap
proposals are at such an early stage, and where no-one can predict
within a few orders of magnitude what the ultimate adoption rate
would be, in terms of numbers of end-user networks, or in terms of
the number of individually mapped EID prefixes ("micronet" is the
Ivip equivalent).


> If, as you suggest, Ivip does not cause problems due to large
> update volume, such an evaluation could prove you right!

I can't scientifically prove in 2008 that an Ivip system, as
currently defined, would do or not do anything in the long-term future.

However, if you are prepared to relax a little about how provably
correct these arguments should be, here is my attempt at explaining
the Ivip business model regarding the push of mapping information.

Please cast your eyes over - or better still read - pages 33 to 48 of:

  http://tools.ietf.org/html/draft-whittle-ivip-db-fast-push-00

I don't want to bloat RRG messages by repeating that material here.


The diagram at:

http://tools.ietf.org/html/draft-whittle-ivip-db-fast-push-00#page-40

shows one of multiple RUASes, with its connections directly or
indirectly to end-users who provide mapping updates.  These
end-users lease their mapped address space (a User Address Block -
UAB - which they split into as many micronets as they like) either
directly from the RUAS, or from some organisation which pays the
RUAS to push mapping updates into the global fast push system.

I guess there would be a few dozen RUASes at most.

Ivip's UABs and therefore micronets are contained within one of many
(potentially hundreds of thousands, but ideally fifty or a hundred
thousand or so in a well developed system) Mapped Address Blocks (MABs).

Each MAB is advertised as a single prefix to BGP by the OITRDs (Open
ITRD in the DFZ - formerly "anycast ITRs in the core/DFZ", see RRG
message 937).  There are several ways this could be done, including:

  1 - A single global set of OITRDs, all advertising the full
      set of MABs.  (Or, for load sharing, a single global
      network, but in busy locations have multiple OITRDs with
      each only advertising a subset of MABs.)

  2 - Multiple independent networks of OITRDs which rent their
      services to RUASes or whoever it is which "owns" a MAB and
      so needs to support optimal paths for traffic from networks
      with no ITRs.

  3 - Each RUAS having its own network of OITRDs, which advertise
      the MABs this RUAS is responsible for.  In this case, the
      UASes or other organisations who "own" their own MABs would
      pay their RUAS to run the OITRDs.

  4 - A mixed arrangement where some RUASes run their own OITRDs
      and some lease the capacity of other organisation's OITRDs -
      including perhaps those of other RUASes.  Also, whoever "owns"
      a MAB could run their own OITRDs or pay someone else to do it
      - so they don't necessarily require their RUAS to do this.

In all cases, the actual traffic load on these OITRDs will vary
widely with the various end-users.  Some may have a large amount of
address space but either little traffic in total, or little traffic
which flows through OITRDs.  Others may have a tiny amount of
address space, including just the smallest possible micronet (a
single IPv4 address or a /64 for IPv6) and may have massive traffic,
including massive traffic for one or many of the OITRDs which
advertise the MAB their micronet is in.

In all cases, it is technically possible and typically will be
economically desirable to sample the traffic flowing on OITRDs and
then charge the end-user for this.  This may be straightforward if
the OITRD is operated by an RUAS, and the RUAS has the end-user as
their direct customer.  However, it could involve some elaborate
business arrangements.  A complex example would be some specialist
OITRD company renting capacity of its OITRDs to an RUAS, who is
providing OITRD services for a UAS, who is providing this for some
separate organisation who "owns" the MAB and who leases address
space to actual end-users.  (The cell-phone industry has all sorts
of complex business models such as these.)

The point is that there exists within Ivip a perfectly good
technical path and series of potential business models by which all
OITRD burdens are paid for by the end-users who benefit - those who
lease a part of a MAB to be their UAB, and therefore their one or
more micronets of Ivip-mapped address space.

I could go into more detail here, but I think it can be seen that
RUASes (or whoever "owns" a MAB and leases its space to end-users)
has a strong incentive to provide a good global set of OITRDs so
that traffic for these micronets follows generally optimal paths, no
matter where the location of the sending host and of the ETR that
micronet is mapped to.

Furthermore, the RUAS (or MAB "owner") can gain revenue for these
OITRD activities from end-users in proportion to their costs,
including according to:

  1 - The amount of address space each end-user has in their UAB.

  2 - The number of micronets each end-user has.

  3 - The traffic which flows to these micronets via the OITRDs.

I think that is a pretty good business case for the OITRD system.
In a number of message in the last month or so, I think that Dino
has stated that the LISP team have no business model for their Proxy
Tunnel Routers (PTRs, which are the equivalent of OITRDs).

It is not clear to me what the business model is for the same task
in APT.

There is a related question:  If the OITRD network is so good, then
why would an ISP want to install an ITR?  I won't follow this up
fully, but:

  1 - The local ITR is a way the ISP can ensure packets from its
      direct customers are more reliably tunneled than by relying
      on anything outside their network.  So this is a marketable
      advantage, paid for by their direct packet-sending customers.

  2 - When the ISP has ETRs in its network, with Ivip, all packets
      to ETRs go via ITRs, so it makes sense to have ITRs in the
      ISP's network, rather than send the raw packets out to the
      nearest OITRD.  (Ivip will not expect or probably allow the
      local routing system to deliver packets to wherever the ITRs
      and ETRs would deliver them - because it is probably
      unrealistic to expect the local routing system to respond
      as quickly to mapping changes and to be as reliable as the
      ITRs and the ETRs themselves.)



Now to the fast push network.

All the RUASes collaborate to build and run a bunch of "Launch"
servers, which operate as a distributed, redundant, highly reliable,
system for accepting packets full of mapping updates from the RUASes
and sending the same set of such updates to the first level of the
Replicator system.  (Ivip doesn't have to be done this way - this is
just my current suggestion.)  Here is the diagram on pages 45-46:


   \   |   /   }  Update information from end-users
    \  V  /    }  directly or via leaf UAS systems.
     \ | /
      \|/
    RUAS-X ->--------------[snapshot & missing packet HTTP server 1]
      /|\              \
     / | \              \--[snapshot & missing packet HTTP server 2]
    /  |  \              \
   /   V   \              \-- etc.
       |    \
       |
       |  30 individually streams of identical real-time
       |  updates to the 8 Launch servers - for RUAS-X's MABs.
       |
       |
   \   \    |    /   /     Each of the 8 Launch server gets a
    \   \   V   /   /      stream from every such RUAS.
     \   \  |  /   /
     [Launch server N]     The 8 Launch servers have links with each
        / / | \ \          other, and each second, all, or most of
       / /  V  \ \         them, send streams of update packets to a
      /  |  |  |  \        number of level 1 Replicators.  For
            |              instance 32 in this example, with each
            |              launch server sending packets to 16
            |              Replicators.
            |
            \
             \         /   Even with packet losses and link
              \       /    failures, most of the 32 level 1
     level 1   \     /     Replicators receive a complete set of
                \   /      update packets, which they replicate to
            [Replicator]   16 level 2 Replicators.
              / / | \ \
             / /  V  \ \
            /  |  |  |  \  In this example, each Replicator consumes
                     |     two feeds from the upstream level, and
                     /     generates 16 feeds to Replicators in
                    /      the level below (numbered one above the
         \         /       current level).  So each level involves
          \       /        8 times the number of Replicators.
   level 2 \     /
        [Replicator]       These figures might be typical of later
          / / | \ \        years with a billion micronets, however
         / /  V  \ \       in the first five or ten years, with
        /  |  |  |  \      fewer updates, the amplification ratio
       /   |  |  |   \     of each level could be much higher.
      /    |  |  |    \
     /     |  |  |     \   Replicators are cheap diskless Linux/BSD
           |     |         servers with one or two gigabit Ethernet
           |     |         links.  They would ideally be located on
                           stub connections to transit routers,
        levels 3 to 6      though the Level 5 and 6 Replicators
                           (32,000 and 128,000 respectively) might
       \   |    \     /    be at the border of, or inside, provider
        \  |     \   /     larger end-user networks.
         \ |      \ /
         ITRD     QSD      ITRDs and QSDs get two or more ideally
                           identical full feeds of updates - so
                           occasional packets missing from one
                           are no problem, since the other stream
                           provides a packet with an identical
                           payload.

   Figure 2: Multiple levels of Replicators drive hundreds of
             thousands of ITRDs and QSDs.


The Replicators receive two such update streams and as long as a
given packet appears in one stream, the Replicator takes its
contents and fans it out to multiple other Replicators on the next
level below.  This is somewhat like multicast, but it is more robust
due to the dual streams being sent to each Replicator.  There is
virtually no delay in this - as soon as the first packet of the pair
arrives, it is sent out to multiple other Replicators.

My 5 second goal has most of its delay in the UAS, RUAS and Launch
systems.  Once the packets reach the first level Replicator, they
will be fanned out to the ITRDs and QSCs all over the Net within a
fraction of a second.

My current plan is for the RUASes collectively to own and run most
of the Replicators.

At some point, say within a large ISP, the ISP may run its own
Replicators, and fan the mapping updates out to its various full
database ITRs and query servers (ITRDs and QSDs) - perhaps through
more such Replicators of its own.

Replicators are just servers with one or two 1Gigabit/sec Ethernet
ports.  They are not routers, and just need a single fixed, public,
IP address.  So they are pretty cheap.  The actual volume of update
messages is not necessarily frighteningly large, even if billions of
end-users are mobile, with cell-phones and laptops.  This is because
the TTR-based mobility extensions to Ivip only typically involve
mapping changes when the device moves more than 1000km or so:

  http://psg.com/lists/rrg/2008/msg00832.html
  http://psg.com/lists/rrg/2008/msg00842.html  etc.


In this model, all of the RUASes internal operations for updates,
all of the Launch system, and most of the Replicator system will be
paid for by the RUASes.

The RUASes will recover these costs, and hopefully make a profit for
providing this valuable service, by charging end-users directly or
indirectly - via UASes etc. to whatever company "owns" the MAB and
actually charges end-users for updates, address space, OITRD traffic
etc.

Leaving for the moment the question of burdens placed on the fast
push system, and on ITRDs and QSDs etc. which are not run directly
by the RUASes, I think the above arrangement constitutes as good an
argument as I can imagine about the system being scalable without
unfair burdens, hitting actual technical limits etc.

In terms of costs:

  1 - No update is sent into the system unless the end-user pays
      for it in some way. (This may well be a flat rate fee for
      all updates up to some limit, or a fee per update.)

  2 - This large proportion of the fast push system is therefore
      fully paid for by these fees, and with fees for leasing
      address space.

  3 - If the costs of providing the fast push service increase
      above the fees, then the RUASes simply increase the fees
      and or invest more in the system to increase its capacity
      and efficiency.

If an end-user doesn't like the fees they are paying, or the quality
of service (such as reliability and speed of conveying updates, or
the capacity of the OITRDs for their packets) then they stop leasing
their current space and lease some different space from some other
MAB "owner", presumably one which uses a different RUAS.

Ivip does not completely isolate end-users from the economic and
performance consequences of choosing a particular supplier of
address space (and therefore who is responsible for pushing out
mapping updates and running OITRDs).  I don't think any map-encap
scheme could do this.

It does however completely free them from dependence on any one ISP
for Internet access which is a major part of our goal: a new routing
and addressing architecture to provide vastly more end-users with
address space which is portable (in terms of access network) and
which supports multihoming, TE etc.

It is easy to see there being a healthy competitive market in the
provision of all these aspects of the Ivip system.

(Note - I am discussing Ivip as if it were a single global system.
Any company, right now, could apply these principles in their own
private system - with ITRs at Internet exchanges around the world -
and run their own private Ivip-like system.  This could be a good
business venture for a company which extends this to a TTR-based
mobility system, and who develops suitable TTR <--> MN protocols and
software: http://www.firstpr.com.au/ip/ivip/#mobile .)

In terms of technical limits, it is easy to imagine that by the time
higher volumes of updates are handled, server capabilities etc. will
be even greater than they are today.  The Replicator function deals
with entire packets of updates as a single body of information. It
does not process each update individually.  There is some crypto
authentication work accepting the two incoming streams of packets
and sending them out to 20 or so Replicators.  In the initial years,
when update traffic is light, each Replicator could probably fan out
to more than 20 other Replicators - so reducing the total number
required to feed a given number of ITRDs and QSDs.

I can't imagine that the update volume would ever exceed the
capacity of any single Replicator server.  As such a state
approached, the problem would be in terms of reducing the output
fanning factor to make it easier on each Replicator.  If such a
state was reached, the system could be split into two parallel
Replicator networks, each with its own Launch server.  Then, there
would probably need to be splits in ITRDs and OITRDs.  But how to
split a full database QSD into two or more subsets of all MABs?  It
could be tricky, but I don't think we will ever have the volume of
updates to force such splits.


The above description does not cover the burdens placed on
organisations outside the RUASes by the existence of a micronet, or
by changes to the micronet's mapping, or by changes to the way an
end-user's UAB is split up into micronets.  (ITRs and the mapping
system only see MABs and micronets - the UAB is an administrative
arrangement between whoever "owns" the MAB and each end-user.)

These include:

  1 - Burden on Replicators operated by ISPs and other networks
      to feed their own ITRDs or QSDs, or to feed other Replicators
      in other networks.

         (Note, during the first years of introduction, these
          volumes of updates are going to be really low, so it
          would make sense to run the Replicator code on some
          stable, but not too busy, server, such as a nameserver
          mailserver or whatever.  Only later would it be best
          to have dedicated servers for the Replicator function.)

  2 - Burden on the ITRDs and QSCs from these micronets and updates:

      a - Communications bandwidth, as part of receiving the
          total update stream.

      b - CPU load decoding the stream and storing the data in the
          local copy of the mapping database.  This local copy needs
          to be suitable for handling queries (an ITRD is really
          a caching ITR with an integral or nearby full database
          QSD query server) and for testing the local copy against
          periodic CRCs (or similar) of each MAB's current state,
          which will be sent as part of the stream of updates.

      c - RAM storage for the database.  (With billions of
          micronets, RAM will still be OK by the time this occurs,
          but a consumer 1TB hard drive today will cope with
          anything conceivable with the most bloated, billions of
          end-users, IPv6 mapping database.  BTW, DRAM now costs
          2.5 Australian cents a megabyte. In 1980, the figure was
          $10,000.)

      d - As the mapping database grows and is more difficult to
          index when querying, so more costs, delays, CPU activity
          per query for each QSD (and therefore for the QSD part
          of every ITRD).


I don't have a complete business plan for this.  For instance, if an
ISP or some other network in Australia is running a bunch of ITRDs
and QSDs, why should they be spending money on RAM and CPU effort
processing a bunch of updates which do not concern any traffic their
ITRs are handling, such as for:

   1 - Micronets in some other country for which the mapping is
       changed very frequently due to the end-users finding this
       to be a good way of load-balancing incoming traffic, and so
       getting better value from their multiple expensive links from
       ISPs.

   2 - Micronets for some end-users who are frequently changing
       their mapping for purposes of mobility.  This will typically
       only be due to them moving large distances - such as more
       than 1000km or so, and then only when they derive a real
       benefit due to shorter paths from a new, closer, TTR.
       (They retain connectivity even if the mapping doesn't change
       and they keep using the one TTR, which may be far from their
       current access network.)


In terms of the number of micronets and the volume of updates, these
represent ongoing costs in terms of RAM and traffic volume.  While
the cost of RAM is vanishingly small - since in IPv4, each mapping
entry is only 12 bytes of actual data (32 or so for IPv6) - there
are still costs which might make some ITR operators feel like
ignoring some subset of the updates or perhaps not pushing them as
far as the others (there is no obvious way to do this).

A potential way around this is that if the MAB "owner" finds
(through reports from disgruntled end-users) that there are ITRs
which are not responding to all their mapping changes, then the
"owner" has an economic incentive to pay these ITR operators to
accept the changes into their system.


I can't prove that Ivip has a compelling business plan in every
respect, but I think the above is quite promising.

I believe it is far more promising than the situation with APT, in
which there is no method of monitoring, charging or preventing
"overly frequent" mapping changes (however defined).  So I think
that both APT and LISP-NERD are subject to the same problem which is
central to the current BGP routing scaling problem: there is no way
of stopping some participants from unreasonably burdening other
participants.

LISP-NERD doesn't scale well, so to me the practical choices are
between:

  APT  slow }  Hybrid push-pull, adaptable, scalable etc.
  Ivip fast }

  LISP-ALT  } Supposedly pure pull, but there are moves to do
  TRRP      } "Notify" (push) packets, and to somehow get packets
            } to ETRs before the ITR has mapping.
            } Endlessly scalable in terms of number of EID prefixes
            } but at the cost of dropping or significantly delaying
            } initial packets in some (probably many) new
            } communication sessions.

   - Robin


--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg
Prev by Date: Re: [RRG] Not moving the problem to the global mapping system
Next by Date: [RRG] RRG process clarification
Previous by thread: [RRG] updated nerd draft
Next by thread: [RRG] RRG process clarification
Index(es):
- Date
- Thread