[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RRG] Re: Fast and sparse mapping? aggregated EIDs, OITRDs etc.

To: Routing Research Group <rrg@psg.com>
Subject: [RRG] Re: Fast and sparse mapping? aggregated EIDs, OITRDs etc.
From: Robin Whittle <rw@firstpr.com.au>
Date: Tue, 16 Sep 2008 16:25:05 +1000
Cc: Brian E Carpenter <brian.e.carpenter@gmail.com>
In-reply-to: <48CF2F9B.80906@gmail.com>
Organization: First Principles
References: <48CF26F6.90008@firstpr.com.au> <48CF2F9B.80906@gmail.com>
User-agent: Thunderbird 2.0.0.16 (Windows/20080708)
Short version:     I attempt to explain why for scalability reasons,
                   we want there to be relatively few Mapped Address
                   Blocks, each covering the Scalable PI space of
                   many end-user networks.

                   I think this relates to the LISP concept of
                   "highly aggregated EIDs".


Hi Brian,

Thanks for your fuller explanation.

>>> On 2008-09-11 17:02, Tony Li wrote:
>>>> Hi Robin,
>>>>
>>>> |There is no scalable routing system - or transition scheme to a
>>>> |scalable routing system - which can meet our goals of being
>>>> |attractive to end-users and generating real scalability benefits
>>>> |if it is not allowed to require most small end-user networks
>>>> |(all PI and probably most small PA) to renumber once when
>>>> |adopting the new system.
>>>>
>>>> One word: translation.
>>>>
>>> Two words: fast mapping.
>>
>> Other than Ivip's essentially real-time control of ITRs by
>> end-users, I am not sure what "fast mapping" means.  Please explain
>> what you mean, with some examples.
> 
> An EID to RLOC map that can be consulted in much less than one packet time
> in an edge router.

OK - some examples of "fast mapping":  Ivip has local full database
query servers.  APT has the same: Default Mappers.  In both cases, a
query will come back in milliseconds, with low communication cost
and high reliability.

Also, with Ivip, the mapping reply comes with a caching time, and if
during that caching time the query server gets a mapping update
which affects a micronet for which a reply is still within the
caching time, it sends a message to the ITR which requested the
mapping to tell it of the new mapping.  (There is also the option
for intermediate caching query servers, which pass on such mapping
changes.  This local push of mapping changes to ITRs which may need
it will be within milliseconds of the full database query server
getting the mapping update.  This is all secured by a nonce sent in
the original mapping request.)   I think that APT has a similar
functionality for getting mapping changes to ITRs which recently
requested mapping for a particular micronet.

LISP-NERD has a full database query server in each ITR, since they
are full database ITRs.  So this is "fast mapping" too.

LISP-ALT and TRRP use a widely distributed system of query servers
which are scattered all over the globe - so they can't do "fast
mapping".


>>> I missed the bit where it was proved that fast mapping
>>> mathematically requires aggregatable EIDs.
>>
>> Who suggested this?
> 
> I've seen numerous references to EIDs needing to aggregate.
> I'd like to know why.

"EIDs needing to aggregate" sounds like LISP terminology.  I think
the same principles apply to LISP Proxy Tunnel Routers (PTRs) as to
Ivip Open ITRs in the DFZ (OITRDs).  I will explain it in Ivip terms
below.  This would be easy face-to-face, with pen and paper.  By
email, I think I need to give a longish explanation, since I can't
see whether you are nodding with understanding or not.


>>> As I said a couple of weeks ago:
>>>
>>> However, would a lower estimate, say 2^23 (8 million) prefixes be
>>> an issue?
>>
>> An "issue" (meaning problem) for what?  I assume this figure is for
>> the number of EID prefixes (LISP and APT terminology) or micronets
>> (Ivip) which the mapping system supports.  Or is this a figure of
>> how many routes are advertised in BGP?
> 
> I'm thinking specifically of the number of EIDs to be mapped to
> a significantly smaller number of RLOCs. And I'm assuming that
> lookup in a table of 2^23 entries is a fast operation.

OK.  In Ivip, the full database query servers hold the whole mapping
database and get updates continually from the network of Replicator
servers which fan it out from the source of the mapping information
- the RUAS (Root Update Authorisation System) organisations.

  http://tools.ietf.org/html/draft-whittle-ivip-db-fast-push-01

8 million is a small number.  There's no reason why a modern server
shouldn't be able to handle a hundred million or more in RAM.
There's no practical limit when you consider the capacity of modern
hard drives.

>>> If not, why can't we start on the assumption that we know how to
>>> support a map with 8 or 10 million entries, and have many years to
>>> figure out a sparse  mapping system with several orders of
>>> magnitude more entries?
>>
>> I don't understand what "sparse mapping" means.
> 
> One where we assume that no aggregation is possible in the ID space,
> i.e. the ID space is sparsely populated.
> 
> (Given the history of IP addressing, and the recent discussions
> on this list, I don't know any other assumption we can make
> except that IDs will look like meaningless randomly distributed
> numbers.)

I don't think "sparse mapping" is a helpful term.  What you describe
is the prefixes which are mapped by the new system being scattered
all over the address space, presumably generally not alongside each
other and presumably with conventional PI prefixes, or the prefixes
used by ISPs for their PI customers, in between.


I am going to use the term "Scalable PI" (SPI) space to denote the
kind of address space which is managed by the new scalable routing
architecture, in this example Ivip.  SPI space is portable between
ISPs which have ETRs and it can be used for multihoming.  The
end-user network retains these actual addresses indefinitely.

The questions include:

1 - How did they get these addresses?

2 - Who manages the mapping for them?

3 - How does an OITRD (PTR) attract packets emitted from
    hosts in non-upgraded networks (no ITRs in that network)
    so the packets can be tunneled to the correct ETR?

4 - Who runs the OITRDs?

Your "sparse mapping" arrangement involves for instance a million
separate SPI end-user networks each with their own micronet, or
multiple contiguous micronets, but with each block of address space
used by each such SPI network not being contiguous, but scattered
all over the address space.

I will use the term "User Address Block" to denote a contiguous area
of address space used by one end-user network.  They may have it as
a single micronet, or they may split it into as many micronets as
they like.  Ivip's mapping resolution will be 1 IP address for IPv4
and a /64 for IPv6.

Let's say there is a small PI using network NA at present, on

   99.0.20.0/22

and they convert this to SPI space.  I will assume the neighbouring
address space is still being managed conventionally - BGP managed
conventional PI space for other end-user networks or space for an
ISP which uses it for PA space for various customers.

NA needs to contract an RUAS to handle the mapping of their /22
prefix, which is now a "User Address Block" in the Ivip system.

I assume NA needs to have OITRDs around the Net to collect packets
addressed to 99.0.20.0/22 so they can be tunneled to the ETRs -
wherever NA maps its micronets.  If all the ETRs are in Australia,
then it is probably fine to have one or two OITRDs in Australia.  If
NA wants to use ETRs all over the world, and expects packets to be
coming from non-upgraded networks all over the world, then in order
to ensure generally optimal path lengths, there needs to be OITRDs
advertising 99.0.20.0/22 all over the world.

NA might want to run their own OITRDs if there only needs to be a
few in Australia.  However, if they need OITRDs all over the world,
they could contract some specialised OITRD company to handle their
99.0.20.0/22 prefix on that company's already existing network.  The
most likely arrangement is that NA would pay their RUAS company to
run OITRDs for them.  The RUAS might do this directly, or pay one or
more OITRD companies to do it for them.   Paying for OITRDs would
involve some component which depends on the actual traffic handled
by those OITRDs for NA's prefix.

OITRD support is essential in order to handle packets from
non-upgraded networks.

In order for one or more OITRDs to get the packets, they need to
advertise 99.0.20.0/22 in the DFZ.  This is a "Mapped Address Block"
(MAB).

MABs are always genuine prefixes, since they are advertised in BGP.

UABs (User Address Blocks) do not need to be actual prefixes, since
Ivip could have multiple UABs in one MAB, with boundaries between
one UAB and the next in arbitrary locations.  Likewise, within a
UAB, the space can be split up into micronets on any /32 (or /64 for
IPv6) boundary, so micronets don't need to be actual prefixes either.

In this example, NA uses Ivip to enable the /22 to be split into up
to 1024 /32 micronets.  The divisions between the micronets can be
changed at will, and each micronet is mapped to a single ETR address
 anywhere in the world.

If NA was only ever going to map the whole /22 to a single ETR, then
the current situation does not provide any scaling benefits:

   Initially, before Ivip, NA burdened the DFZ routing table with an
   advertisement for their PI prefix.  Now they burden it with an
   advertisement for their MAB.  The BGP routers can't tell the
   difference and there are no scaling benefits.

If, without Ivip, NA was going to split their space into multiple
separate BGP advertised prefixes, for instance for load-balancing
over multihomed links, into four /24s, then the scenario so far does
involve some scaling benefits:

   Instead of them advertising 4 /24 PI prefixes in BGP, they are
   only advertising a single /22 MAB prefix, so this reduces the
   size of the DFZ routing table by 3.

If a million PI networks (this is an example in the future, I guess
there are only tens of thousands such networks today) such as NA did
the same thing, then there would be a million MABs advertised in the
DFZ, compared to however many (over a million, maybe several
million) PI prefixes would have been advertised if these networks
had split the space up as they wished with current BGP techniques.
So there would be some scaling benefits.

However, this is far from ideal.

One way of improving scaling is for any of these million SPI
networks which have adjoining address space to advertise their two
or more UABs as a single MAB, instead of each UAB being a separate
MAB.  Then, for instance, there might be four contiguously spaced
networks such as NA:

 UAB of NA  99.0.20.0/22  } = MAB covering all four UABs:
 UAB of NB  99.0.24.0/22  }       99.0.20.0/20
 UAB of NC  99.0.28.0/22  }
 UAB of NC  99.0.32.0/22  }

They would need to agree on an RUAS and on who would run OITRDs for
this MAB.  This would result in better scaling benefits, reducing
the number of DFZ routes by 3.

In these examples, existing PI networks did not renumber - they
converted their space to SPI space.

Please see my recent messages on renumbering or not:

  Renumbering once might be OK when converting to Scalable PI (SPI
  space)  http://psg.com/lists/rrg/2008/msg02355.html 2008-08-26


Now let's consider networks which began with PA space.  Clearly it
is not possible to have a currently PA network keep its address
space when switching to SPI.

For example the current PA prefix of NX is a single IP address, or
perhaps 16 IP addresses or whatever, in the middle of some large
block of address space advertised as a single prefix by an ISP.

This space could be managed by Ivip, but it would be impossible to
support it with OITRDs, because the only way an OITRD could get the
packets addressed to this range of addresses is to advertise a
prefix covering them in the DFZ.  That would punch a hole in the
middle of the ISP's prefix, and with current arrangements, would
need to be a /24 at the least.  Maybe NX has less than 256 addresses
- say just 16:

   44.0.1.16/28

There's no way OITRDs could advertise the smallest block of address
space which covers this, within the current convention of no longer
than /24 prefixes being advertised and accepted in the DFZ, without
clobbering the rest of that /24 outside the /28, and so preventing
the ISP from using that space for other customers.

So if any network which happens to have PA space wants to use SPI
space, they are going to have to relinquish their PA space and gain
some new space within an MAB.   (A theoretical alternative would be
for them to get their own PI prefix, convert it into an MAB and be
like NA etc. above.)

It is much more likely and desirable that the RUAS organisations
themselves, or other companies (perhaps RIRs too) would establish a
large MAB and rent the space in small chunks to networks such as NX.
 Actually, they would probably have multiple MABs, but the following
discussion mentions just one.  An end-user network could rent space
in multiple MABs, each from a different MAB operating company, but
here I assume they rent their new SPI space in one MAB from one
operating company.

NX does a once-only renumbering, and after choosing an MAB operator
company they like, pays rent for their new SPI space.  The MAB
operator company either is an RUAS or pays an RUAS to handle the
mapping, and likewise to provide OITRD services.  OITRD usage fees
according to traffic, would be charged by the OITRD company to the
MAB operator company (assuming they are different) and the
statistics of this usage would need to be broken up by destination
address at least to the resolution of a UAB, so that the MAB
operator company could charge their various customers NX, NY,
according to the amount of traffic OITRDs handled for each such
network's UAB.  Likewise, since a mapping change involves a small
fee (I am thinking cents, not dollars), the RUAS charges the MAB
operator company for all mapping changes, in a way the MAB operator
can split this up into mapping changes generated by NX, NY etc.

In practice, NX and NY may no generate their own mapping changes,
but would probably pay a company of their choice - which could be
the MAB operator company - to monitor their multihoming situation
and make the mapping changes required to maintain connectivity via
various ISP's ETRs.

Assuming that OITRD support is required by all end-users (which is
true, since no end-user would want to be reachable only by hosts in
networks already with ITRs) then it is clear that the system cannot
enable current PA users to use the new system without renumbering
and gaining some space within an already existing, large, MAB.

Generally new networks which commence operation with SPI space would
do as NX does - rent some space from an MAB operator company.

Existing PI networks would have the choice of either abandoning
their current PI space and renting space from an MAB operator
company, or of converting their current PI space into an MAB.
(Perhaps they could amalgamate it with neighbouring prefixes which
are already MABs, to create a single larger MAB.)


From the point of view of maximising scalability, the best
arrangement would be for a smallish number of large MABs (short
prefixes), each providing space for hundreds, thousands or in
principle millions of SPI user networks, each of which has a UAB
they can split into multiple micronets.

There are some reasons for limiting MAB size.  The mapping system
will primarily send updates to the current mapping for each MAB.
However, it will also periodically send a checksum, CRC or whatever
of each MAB's total mapping data.  Maybe ever 5 minutes or so.  The
timing of these CRCs will coincide with the RUAS making a compressed
dump file of that MAB's mapping.

Full database query servers (QSDs) will check their current updated
copy of each MABs mapping against the CRC.  If it doesn't match,
they will download the latest dump ASAP.  This will take a few
seconds at least, and in that time, the QSD will need to buffer
updates for this MAB.  Once it has the MAB dump, unzips it into its
database and applies the updates which followed the CRC, then it has
a fully synchronised copy of this MAB's mapping again.

If IPv4 space gets broken up into millions of micronets, with many
being one or a few IP addresses, then it would probably be best not
to have a MAB as large as a /8.  The size of those dumps would be a
rather unwieldy, since there could be up to 16 million micronets and
each micronet's mapping requires about 12 bytes of data. (32 bit
start address, 32 bit ETR address and up to 24 bits of length address.)

My guess is that a good maximum size for an MAB might be a /10 or
/12 or so.

Ideally, for IPv4, we might wind up with thousands to a hundred
thousand MABs.  These would cover the SPI space of potentially
hundreds of millions of end-user networks, assuming most of them
only needed one or a few IP addresses.  The rest of the advertised
space would be used by ISPs, for their internal operations, for ETR
addresses and for space for their PA customers.


I think the LISP term "highly aggregated EIDs" refers to the notion
I have just mentioned: a single BGP advertised prefix which is
needed for PTRs to work, but covering many EID prefixes.  However,
it would be best for the LISP folks to explain this, and whatever
scenarios they have in mind for how their system would be adopted by
a variety of types of existing networks.

LISP-ALT has no centralised mapping organisations like RUASes.
Maybe LISP-NERD has something vaguely equivalent, I am not sure.

LISP needs PTRs to be attractive to end-users.  So far, I think
there is no clear business plan for who would deploy these PTRs.

My business plan for OITRDs is here:

  http://psg.com/lists/rrg/2008/msg02021.html



Sorry this takes a while to explain.  It is not a terribly complex
concept, but it is tricky trying to explain things with pure text to
people I have never met.

 - Robin


--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg
References:
- [RRG] Fast and sparse mapping?
  - From: Robin Whittle <rw@firstpr.com.au>
- [RRG] Re: Fast and sparse mapping?
  - From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Prev by Date: [RRG] Re: Fast and sparse mapping?
Next by Date: Re: [RRG] Consequences of no renumbering...
Previous by thread: [RRG] Re: Fast and sparse mapping?
Next by thread: Re: [RRG] Re: Fast and sparse mapping?
Index(es):
- Date
- Thread