[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RRG] Comparing APT & Ivip
- To: Routing Research Group <rrg@psg.com>, Michael R Meisel <meisel@cs.ucla.edu>
- Subject: [RRG] Comparing APT & Ivip
- From: Robin Whittle <rw@firstpr.com.au>
- Date: Fri, 26 Sep 2008 18:30:29 +1000
- Organization: First Principles
- User-agent: Thunderbird 2.0.0.16 (Windows/20080708)
Short version: From the thread: Re: [RRG] Separation vs. Elimination
http://psg.com/lists/rrg/2008/msg02583.html
Continuing the previous discussion comparing APT
and Ivip.
I also give a detailed description of how Ivip's
multihoming reachability testing and decision
making might work, contrasting it with how I
understand APT Default Mappers would determine
reachability of each ETR, depending in part on
BGP to tell them the ETR's ISP network is
unreachable, which could involve significant
delays.
Hi Michael,
You wrote, in part:
>> Ivip pushes the mapping fast and APT pushes it slowly. Ivip has a
>> distributed, but still reasonably centralised, push system which
>> fans out to all full database query servers. APT has multiple
>> islands, with more diffuse pushing of the updates from multiple
>> sources of mapping data within the islands. (I don't see any
>> benefit in APT islands - I think there should be one APT system.)
>
> I think the larger difference between the mapping systems is what I
> pointed out below -- the difference in what information is distributed.
>
> Regarding APT, I think you will find that APT islands that are not
> physically connected can be logically connected in the next version of
> our incremental deployment scheme. However, we can't force ISPs that
> don't already have business relationships to create them, so there is
> always the (as you say, probably undesirable) possibility of multiple,
> disconnected islands.
OK.
>> OK - but I don't understand how APT (with APT islands) can robustly
>> support packets from non-upgraded networks when there are two EIDs
>> such as /25 or longer, in the one /24, and the two end-user networks
>> are using ISPs in different APT islands, in a setting where it is
>> impossible to have advertisements for prefixes longer than /24
>> propagate across the DFZ.
>
> I assume this is not possible. But that means it provides an incentive
> for the separate islands to converge to one. =)
Indeed.
>>> We are trying to make
>>> the point in the paper that transit networks are the ones that need the
>>> routing table to scale, and it is possible for transit networks to
>>> deploy separation schemes themselves, in theory, in order to directly
>>> address that issue. This is not possible for elimination schemes.
>>
>> I agree it is not possible for elimination schemes, but I can't
>> think how transit networks could deploy any of the current
>> separation schemes without involving the end-user networks. Can you
>> give an example?
>
> Sure: APT. APT is deployed by an ISP by turning their border routers
> into APT EDRs (or TRs or whatever you want to call them). Customers
> outgoing packets are encapped by the ISP, their incoming packets are
> decapped by the ISP. If the customer is multihomed, they can ask their
> providers to put their TE preferences into the mapping info for their
> prefix(es), but that's totally optional.
I will think this through: Lets say I have an edge network -
and end-user network, not an ISP network - and I have my own PI
space used as it is today, with BGP. I have links to two ISPs
and I advertise the whole address space as one prefix, through
either ISP-1 or ISP-2.
It is fine for ISP-1 and/or ISP-2 to pass my outgoing packets
through an ITR (or whatever it is called) arrangement - to find
packets which are addressed to prefixes which are mentioned in
APT's mapping system , and to encapsulate them and tunnel them
to whatever ETR that ISP's Default Mapper decides should get
the packets. That can be done for outgoing packets from any
edge network, PI or PA, without the edge network needing to do
anything, and without it upsetting anything.
Actually, if there are multiple APT islands, then this is only
going to work for packets addressed to edge networks which use
ETRs in ISPs in the island to which ISP-1 and ISP-2 are a part.
Maybe they are in different islands. Edge networks in islands
other than the one the outgoing ISP is in must have their own
arrangements in their own islands to attract packets which make
it past my ISP-1 or ISP-2 ITRs into the DFZ.
But let's consider the options for one or both ISPs using APT
for my address space. There are two cases:
1 - Either ISP-1 or ISP-2 does it, but not the other.
2 - Both do it together. Within this scenario, there are two
options:
a - They are part of the same APT island.
b - They are in different APT islands.
In case 1, let's say ISP-1 is a member of Island-1 and adds my
prefix to the APT mapping system, so it is an APT EID in Island-1.
As long as I want to use ISP-1 for incoming traffic, this will
work. I would have my CE router accept packets from an ETR in
ISP-1's network. However, this arrangement cannot not improve
scalability and support my multihoming arrangements at the same
time.
Let's say I want to use ISP-2 instead, including for reasons such
as my link to ISP-1 is down. Because my prefix is an APT EID in
Island-1, the border routers of ISP-1 will be advertising my
prefix to the DFZ. My understanding of APT's support for packets
from non-upgraded networks is that the border routers of all ISPs
in Island-1 would also be advertising my prefix, acting as ITRs
to collect packets emitted from non-upgraded networks, as close
as possible to where they are emitted, and tunneling the packets
to the correct ETR.
If I wanted to use ISP-2 instead, ISP-1 and any other border
routers of ISPs in Island-1 must stop advertising my prefix, and
ISP-2's border router(s) must start advertising it.
This could be done, I suppose, but I don't think APT is intended
to have such real-time change of the prefixes advertised by
border routers of ISPs in the APT island.
There is no scalability benefit in this arrangement - since my
prefix is still advertised in the DFZ, and the advertisement
still changes when for multihoming reasons, I have the packets
come in via ISP-2.
If there was a scalability gain in that the address range of my
prefix was advertised as part of a shorter prefix by the
Island-1 border routers - assuming that there were other
networks on adjoining address ranges which were also using APT
in Island-1 - then the above arrangement couldn't work, since I
assume it would be unworkable for these border routers to stop
advertising the encompassing shorter prefix, and then advertise
the space in that prefix minus my space - which could involve
multiple prefixes, every time my multihoming arrangement needed
to use ISP-2.
In case 2a, ISP-1 and ISP-2 are both part of Island-1. For the
purposes of collecting packets from non-upgraded networks, the
border routers of both ISPs and all other ISPs in Island-1, will
advertise my prefix to BGP, and these routers will act as ITRs
to encapsulate packets and tunnel them to the correct ETR.
Perhaps my prefix is advertised as it is, in which case there is
no scaling advantage in terms of the number of prefixes in the
DFZ. However, I could split my space up into multiple EIDs and
this would provide a scaling advantage, compared to me splitting
the space and advertising then conventionally as separate prefixes.
Perhaps my prefix is part of the address range of a larger
prefix advertised by the border routers of all the ISPs in
Island-1. Then there are scaling advantages, in terms of reduced
number of advertisements. This would mean that other edge
networks on adjacent space above and/or below my space also use
APT with Island-1.
As long as my network multihomes with ISPs in Island-1, there are
scaling advantages in terms of my multihoming changes not causing
any changes to prefixes advertised in the DFZ.
In this case 2a situation, I guess the system works fine. My
decision to advertise my prefix to one ISP or another, or the
physical fact of each ISP's ETR finding it has or does not have
connectivity to my CE router(s) would cause those ETRs to be able to
send messages to any ITR (or Default Mapper?) which sent packets to
it about it not being able to reach my network, the destination
network. Therefore, the ITRs (with their Default Mappers?) would
figure out, independently, which of these two ETRs of ISP-1 and ISP-2
they should send packets to.
Case 2b won't work - ISP-1 being part of Island-1 and ISP-2 being part
of Island-2. The only way of forcing it to work would be something
ugly and impractical such as case 1, in which one all of one island's
ISP's border routers must stop advertising my prefix and the border
routers of the other island's ISPs must start advertising it,
according to which ISP's ETR my network was using at the time.
So this means a multihomed edge network needs all its upstream ISPs to
be in the one APT island.
It also means that a single ISP can't introduce APT for a multihomed
customer, unless it does so in concert with the one or more other
upstream ISPs which must also be in the same APT island.
As far as I can see, ISP-A or ISP-B can't act alone - without the
involvement of my network - to handle my prefix with APT. They would
need to check with me and verify that every upstream ISP I uses was
both using APT and was in the same APT island. Then they would need
to reconfigure the APT island's mapping system to include my prefix
as an EID, and make sure that it was advertised by all Island-1's
ISP's border routers, either on its own or as part of a shorter,
encompassing prefix.
So I still can't think how transit networks could deploy any of the
current separation schemes without involving the end-user networks.
>>> If your *provider* is encapsulating your packets (as in APT), of course
>>> you can't address anything in the *upgraded* transit core. But
>>> non-upgraded transit networks are effectively in edge space, so they can
>>> still be addressed.
>>
>> I would have thought it more correct to say:
>>
>> non-upgraded transit networks are effectively in *core* space, so
>> they can still be addressed. The Default Mapper has no mapping
>> for any address in a non-upgraded edge network, and to ensure
>> it can be reached from an upgraded edge network, it must forward
>> the packets without encapsulation.
>>
>> (Of course, for hosts in non-upgraded edge networks to be able
>> to send packets to hosts in upgraded edge networks, one or
>> more border routers in the APT island need to advertise either
>> the upgraded edge network's EID prefix, or some other prefix
>> which covers this EID prefix.)
>>
>> Only once all edge networks adopt APT will the core be truly
>> "separated" from all edge networks.
>
> Continuing from where I left off above, all packets that get sent to the
> ISP from then on are either already encapped (by another ISP in the same
> island), or get encapped by that ISP. Same goes for decap. No non-border
> router inside that APT island is directly addressable by devices outside
> the island.
I am finding this hard to follow. Are you referring to the initial
situation where you need to be able to send packets to edge networks
which have not yet adopted APT, or after APT is adopted by all edge
networks?
>> handling of changes is intended to be much faster and less expensive
>> - while also making it easy to charge a small fee per update, a few
>> cents for instance.
>>
>>> APT only distributes topology information in the mapping system.
>>
>> For each micronet, Ivip's mapping system distributes the address of
>> the ETR to which packets addressed to that micronet should be sent
>> by every ITR. It is the responsibility of the end user to change
>> the mapping to some other ETR which works if the current ETR is not
>> working, or is not connected to their network. This requires Ivip's
>> mapping to be sent out fast - effectively in real-time - and it
>> simplifies the ITRs and ETRs and reduces the size of the mapping
>> information, since ITRs and ETRs are not involved in testing
>> reachability.
>>
>> APT, like LISP or TRRP, has mapping information with two or more ETR
>> addresses (assuming a multihomed end-user network).
>>
>> This mapping information is assumed not to need to be changed very
>> much, since APT's push (to the Default Mappers, from which ITRs pull
>> the mapping information they need) is slow by comparison to Ivip.
>>
>> Consequently, each APT (or LISP or TRRP) ITR (or the ITR's Default
>> Mapper) needs to do its own probing of ETRs or use whatever
>> techniques to determine ETR reachability - then the ITRs (actually,
>> I think it is APT's DMs) need to make their decisions, based on the
>> mapping information, which ETR to tunnel the packets to.
>>
>>
>>> So the
>>> increased number of edge prefixes that an APT-based network can handle
>>> is relative to the (increasing) amount of BGP traffic that is due to
>>> reachability changes in edge prefixes.
>>
>> Are you saying that a major efficiency - that is, scaling -
>> advantage of APT is that it doesn't convey reachability information?
>>
>> Assuming this is the case, then I see it is true to this extent:
>> Your mapping doesn't need to be pushed as fast or as frequently as
>> Ivip's, or as fast or frequently as BGP would ideally propagate its
>> changes.
>
> Yes, exactly!
OK.
>> No-one else has drawn parallels between Ivip and BGP - so this is
>> interesting. Clearly they are completely different, and Ivip
>> assumes a core routing system - BGP. However, they are alike in
>> that both are in a hurry to communicate "reachability" information
>> across the Net.
>>
>> Every time a BGP prefix is advertised, or no-longer advertised at a
>> given router, changes to this effect need to be propagated. The
>> changes may not need to go very far, or they may involve changing
>> the best path decision of every router in the DFZ.
>>
>> Every time an Ivip micronet is mapped to some ETR, or not mapped to
>> it (typically it would simply be mapped to another ETR, but it could
>> be mapped to NULL, or its space could be reassigned to one or more
>> different micronets), the Ivip system needs to convey this quickly
>> to every full database query server. (And to any full database ITR,
>> but my current thinking is that there will be few or none of these -
>> just caching ITRs, some of which have a full database query server
>> integrated into them, or have one in the same rack.)
>>
>> APT is not in a hurry to push mapping information to all the
>> island's Default Mappers.
>>
>> Being not in a hurry is arguably a scaling benefit, as is not having
>> to push the mapping very often. However, one extra cost (compared
>> to Ivip) is that APT needs to push more complex, more voluminous,
>> mapping information. Another major extra cost your ITRs/DMs and
>> ETRs need to be much more complex than Ivip's because they must do
>> all the reachability testing and multihoming service restoration
>> decision making.
>
> This is all true. But we argue this is necessary complexity. The
> Internet is a complex system. I think it was Einstein that said: as
> simple as possible, but no simpler.
OK - I am going pursuing a system with a more ambitious mapping
system, to save complexity and protocol overhead in ITRs and ETRs.
You are aiming to simplify the mapping system in terms of the
frequency of mapping updates, but one price you pay is more
complex mapping information (addresses for the various ETRs,
with preferences for load sharing and for which one to choose
in a multihoming service restoration setting).
You are also paying a price in complexity of ITRs and ETRs,
and in inflexibility regarding reachability detection and
decisions resulting from those reachability findings. Ivip
puts that stuff outside of the system and both enables and
requires end-users to do their own reachability testing and
consequent decision making.
>>>> The Ivip approach is to provide firstly a more efficient mapping
>>>> system relative to BGP - as does APT - but also to technically
>>>> structure the mapping system so end-users can be made to pay for
>>>> each mapping change. This enables quite a lot of the mapping system
>>>> (not all, but most of it) to be run as a series of businesses. More
>>>> details are in:
>>>>
>>>> http://tools.ietf.org/html/draft-whittle-ivip4-etr-addr-forw-01
>>>>
>>>>
>>>> Then, there funding for the mapping system, so there are incentives
>>>> to build it, extend it, run it, make it more efficient etc. This
>>>> should greatly extend the maximum number of micronets, updates etc.
>>>> the whole system can handle, compared to what I understand is the
>>>> APT approach of a more efficient mapping system with costs falling
>>>> unfairly on the ISPs, with no backpressure on end-users to reduce
>>>> the number of mapping changes or the number of EIDs in the mapping
>>>> system.
>>>
>>> The costs are not falling *unfairly* on the ISPs -- they are the ones
>>> that stand to benefit. Again, the primary goal of APT is better routing
>>> scalability, which is a DFZ problem.
>>
>> But what if an end-user network issued a mapping update every 10
>> seconds, 24 hours a day?
>
> Edge networks don't send mapping updates directly under APT. ISPs send
> mapping updates that contain their customers' prefixes.
Ahh - OK.
> We also plan to
> have the protocol limit the frequency of updates to something on the
> order of (very roughly) once per hour. Because we don't carry
> reachability information, there is no need for frequent updates. We're
> currently working on some simulations to see exactly what time scale is
> workable.
Still, the edge network which pushes its luck changing its
preferences every hour, for instance to dynamically adjust its
incoming traffic according to circumstances which vary significantly
hour-by-hour, is placing a considerably greater burden on the whole
APT island than another network which only changes its preferences
every few months. You have no economic arrangement to make the
hourly changing edge network pay more money which is somehow
distributed to the ISPs in the APT island.
When you get to a single APT island - which I think is the only way
APT can work properly - then you have these hourly changers requiring
their mapping change to be pushed to every ISP on the planet. You
might bet that with 100 million edge networks, you are only going to
have a small number who do this. But if it costs them nothing, and
it gives them some means of dynamically responding to incoming traffic
and/or congestion in their various upstream ISPs, then what is to
stop 10 million edge networks changing their mapping every hour?
You are back to the tragedy of the commons, because you can't stop
this and you can't charge them for it.
10 million updates per hour would be unscalable - since you have to
push this to every ISP in the world. (I am assuming 100% APT adoption,
all in one island.)
Ivip has no such problem, or at least has a greatly reduced problem,
because the fee per mapping change will reflect the burden that
change places on most of the mapping distribution system - and make
help pay for the mapping distribution system to be upgraded to handle
whatever volume of changes people want to make at the current cost
per update.
>> That would be as much of a burden on the ISPs in the APT island as
>> hundreds of thousands of ordinary end-user networks which only
>> changed their mapping every month or so. (In terms of mapping
>> traffic and processing it at each DM. In terms of storage, the
>> burden is the same.)
>>
>> In this respect, APT achieves no benefits over BGP. Both APT and
>> BGP have to accept the arguably excessively frequent changes and
>> propagate them - and in both cases there is no way of charging the
>> originator of these changes to help deter them from making so many,
>> or to help pay for their cost across all the ISPs.
>>
>> Ivip is completely different in this regard - there will be a small
>> fee per update.
>
> Yes, our solution is technical (frequent updates are not necessary for
> desired functionality), yours is economic (frequent updates are
> necessary, but can be costly). The way I see it, we are technicians, not
> policy makers, so I don't see how an economic solution is enforceable.
My solution is technical and economic. Yours is purely technical.
Sure we are policy makers - or at least policy prototypers! We are
designing a new architecture to be added to the Internet. That involves
technical, commercial and policy arrangements. We design the whole lot together.
We don't actually make policy, of course. We design a complete
integrated system of protocols, functionalities, design principles and
suggestions for how the thing should be administered and operated in a
business sense. Then, if the IETF likes it, it is developed. Then if
business-people and policy-makers like it, they adopt it.
>> So I think APT continues this problem of having to propagate
>> end-user initiated changes across many devices, without a fee. This
>> is what bedevils BGP - and is a significant part of the the heart of
>> the routing scaling problem.
>>
>>> I suspect you will have a lot to say on this, so perhaps it should be
>>> moved to a separate thread, but I am curious: in regards to your
>>> economic model, what is stopping an ISP from charging their BGP-speaking
>>> customer for each BGP update today?
>>
>> Other folks on the list would have a better idea of this, but I
>> think the current BGP system relies on trust and unsecured
>> announcements. To put up some kind of fence, administered in some
>> way to reject updates from networks which don't in fact pay a fee
>> per update, would require a major upgrade to all participating
>> routers. It would require some pretty fancy security arrangement
>> and my guess is that it is all too much of a headache. It would
>> also require some kind of organisation, accounting system, and
>> presumably some way of distributing monies to help ISPs pay for
>> bigger DFZ routers.
>
> I was trying to say that you could charge edge networks for sending BGP
> updates to their providers, not charge in the core. As we've discussed,
> this is where most of the updates originate. These BGP connections are
> manually configured, and, AFAIK, generally need to match up with the
> physical connection that the customer's data flows across. So it seems
> to me that's pretty hard to fake.
>
>> I figure it could be done. If this was all that was required to fix
>> the routing scaling problem, I think people would be working on it.
>>
>> However, it doesn't alter the fact that the BGP system can't
>> reasonably be expected to scale to as many prefixes as there are
>> end-user networks which need portability and multihoming.
>>
>> Fees for each BGP prefix and for each updates would help deter those
>> who advertise and change their advertisement of prefixes for
>> arguably spurious reasons, so it would marginally reduce the scaling
>> problem. However it wouldn't solve the scaling problem assuming we
>> want to have 10 million, 100 million or whatever end-user networks
>> with portable, multihomable address space.
>
> So, if I am understanding correctly, your claim is the following: Ivip
> solves routing scalability by (a) proactively distributing the same edge
> network topology and reachability information as BGP, except more
> efficiently, and (b) enforcing a charge per update?
Yes, but I never use the word "proactive". Ivip is clearly a very
different system than BGP - it is an overlay system for the
interdomain network which happens to run BGP. The fast push system
should be a lot more efficient than BGP's approach of conversations
between neighbours about individual prefixes potentially resulting
in changed decisions in each router, and therefore changed
announcements by that router.
Ivip is intended to give each end-user network effectively real-time
control of how their space is split into micronets - which are any
contiguous range of addresses, not just binary boundary prefixes as
with BGP and the other core-edge separation schemes. The real-time
control enables the packets to be sent to any ETR in the world, and
for this to be changed in a matter of seconds for a small fee.
In principle, end-user networks can have real-time control over
their existing PI prefixes, but BGP is slower at propagating the
changes over longer distances, and each change burdens potentially
thousands of routers, in principle perhaps every router in the DFZ,
which is not reasonable or scalable.
This real-time, direct, control over the tunneling of traffic
packets is not the aim of APT or LISP-NERD. Those systems let the
end-user specify ETRs and preferences, and have the ITRs (or Default
Mappers) make the decisions from moment-to-moment. LISP-ALT and
TRRP enable the end user to change their mapping as often as they
like, but unless there is an unscalably short caching times on the
mapping replies, this does not translate into real-time control of
the ITRs in the way Ivip is intended to achieve.
>>>>>> Your description:
>>>>>>
>>>>>> LISP-CONS and LISP-ALT build a DNS-like hierarchical overlay to
>>>>>> retrieve mapping data when needed.
>>>>>>
>>>>>> strikes me as wrong. Neither has much resemblance to DNS. ALT is a
>>>>>> completely separate network, with its own BGP instance, using a
>>>>>> different but parallel address space, for sending mapping queries,
>>>>>> which are typically actually traffic packets.
. . .
>>>> In the ALT system, a single query gets to the appropriate server -
>>>> there is no recursion.
>>>
>>> I think you mean, there is *only* recursion, in the sense of recursive
>>> DNS queries.
>>
>> I mean "no recursion".
. . .
>> LISP-ALT has no concept of recursion, as far as I know.
>
> Ok, I see what you mean.
OK!
>>> No, we don't rely on ICMP in APT. We have our own control messages that
>>> are generated and processed at DRs and DMs. I believe the details (at
>>> least of previous versions of our failure handling) are described in
>>> most (if not all) of the APT-related documents.
>> OK - there is a message from the ETR to the DM of the sending
>> network that the ETR can't reach the destination network:
>>
>> http://tools.ietf.org/html/draft-jen-apt-01#section-11.4
>>
>> If the ETR's unreachability is reflected in its BGP prefix no longer
>> being reachable, this is handled in section 6.1.1.
>>
>> But what if there is some network failure between the ITR and the
>> ETR? Section 6.1.2 handles this, on the assumption that the ITR can
>> send packets to the network in which the ETR resides, and that the
>> default mapper there, working with the internal routing system, will
>> be able to detect that the ITR sent a packet to the no-longer
>> reachable ETR.
>>
>> OK - I see how you do this without relying on ICMP messages. You do
>> however rely on:
>>
>> 1 - BGP to tell the sending network's border router, and therefore
>> the ITR, that the ETR's network is unreachable.
>
> As is true in the Internet today.
Yes.
>> 2 - The DM in the ETRs network to tell the ITR (or the ITR's DM?)
>> that the ETR is unreachable.
>>
>> 3 - The reachable ETR to tell the ITR (or the ITR's DM?) that the
>> destination network is unreachable.
>>
>>
>> I think 3 should be OK - and probably 2.
>>
>> However, there can be long delays across the Net with BGP
>> propagating a notion of unreachability. The destination network
>> could go off the air and nearby routers cancel their advertisements,
>> but other routers think their neighbour has a path, and advertise
>> that path's length. This does not necessarily get propagated
>> quickly, due to a delays in each router, including if there is a
>> flurry of such announcements when a major link dies. Also, there is
>> MRAI path hunting, depending on the structure of the routers.
>>
>> http://www.firstpr.com.au/ip/sram-ip-forwarding/#BGP_hunting_MRAI_disc
>>
>> which can delay the propagation of an unreachable condition by ~30
>> seconds for however many depths of this process there are.
>
> All of this is true, but APT isn't meant to fix or avoid BGP's problems,
> just to limit how much of the network BGP routers have to deal with.
The same is true of Ivip.
In both Ivip and APT, we are relying on the BGP system to maintain
connectivity between ISPs. No-one has an incrementally deployable
alternative to BGP for this task, and my impression is that it does
the job very well. (The MRAI timer path hunting timer problem could
be fixed, in my opinion - but BGP is a very deep thing and I don't
know much about it.)
Ivip has a potential advantage over APT regarding rapid response to
reachability problems. This example is perhaps a little contrived -
but it illustrates the point.
Let's say end-user network N1 is multihomed to ISP-1 and ISP-2. It
is currently getting its packets via ISP-1. In APT, that ISP-1's
ETR's address is the address in the mapping which has top priority.
In Ivip, this means that the last mapping update N1 sent for its
micronet was to map it to ISP-1's ETR.
Now let's say ISP-1's border router dies, or some link dies or
whatever. Let's also assume that there is an ugly arrangement of
routers near ISP-1 such that some ISP far away - ISP-9 - doesn't
find out via BGP that ISP-1 is unreachable until 2 minutes after
the problem occurs. This could be 4 levels of 30 second MRAI timer
path hunting, or some other delays in BGP. In particular it could
be many levels of BGP router coping with a flood of changes for
hundreds of thousands of prefixes affected by by the same outage
which affects ISP-1.
Hosts in any edge network which relies on ISP-9's ITRs and Default
Mappers are going to be sending their packets to a black hole for
these two minutes, because ISP-1's ETR is unreachable, and because
ISP-9's DM and ITRs haven't yet figured this out, due to the BGP
delays.
You might at this point decide change APT to to rely on ICMP
messages, but you would need a way of securing those, to prevent
spoofers. That would involve either a nonce and therefore extra
processing and packet length overhead in every traffic packet,
and/or extra processing and overhead due to probe packets at some
rate which would detect the loss of reachability in less than 2
minutes. You couldn't very well pepper the ETR with probes just
because you got an unsecured ICMP destination unreachable packet,
since that opens up a DDoS pathway.
There would be major scaling difficulties with ITRs frequently
testing reachability to ETRs, since the one ETR might be getting
such probes from tens of thousands of ITRs. Whatever you do to
determine reachability, it needs to be hard coded into the ITR and
DM functions and also into the ETR functions. You don't have a way
of letting N1's administrators directly control where ITRs tunnel
their packets, depending on *their* ideas of reachability and
whatever it is they want to do regarding packets coming in via the
ETRs of ISP-1 and ISP-2.
In the Ivip setting, I assume that N1 administrator hires some
specialised company MRD (Multihoming Reachability and Decision-making
Inc.) to continually monitor the reachability of its network via the
two ISP's ETRs. MRD has sites all over the world for this purpose,
and sends a stream of nonce-protected probe packets from all these
sites, to some node in N1 (probably multiple nodes, for redundancy)
which acknowledge the probe with a packet containing the nonce.
N1's administrators are free to choose the frequency of these probe
packets, and so trade off the traffic and load they carry - which
could be small - against how quickly MRD's system could decide that
reachability would be lost. MRD provides a sophisticated language,
or series of options, by which N1's administrators specify what
criteria are used for deciding such things as:
ETR-1 is unreachable, so change the mapping to ETR-2 if it is
is reachable.
After an outage, ETR-1 is reachable for a sufficiently long
period of time that it is best to change the mapping back to
ETR-1 again.
N1's space could be split into multiple micronets, for the purpose of
load sharing over the two ETRs. MRD's system is in charge of the
mapping of these micronets, and it could also communicate with nodes
in N1 which report on traffic loads, congestion etc. so that by
specified decision criteria, MRD would dynamically adjust the mapping
of multiple micronets to load share the traffic however N1's
administrator's choose. This could be an automated process,
or N1's administrators could take manual charge for a while. MRD has
the username and password it needs to change N1's mapping, via the
RUAS (or some other related company) which handles the mapping for all
micronets in the Mapped Address Block which N1's Scalable PI space is
part of. N1 pays for these updates, so N1's administrators will
optimise the decision logic they specify for MRD's operations, to avoid
too much chopping and changes, but to achieve whatever multihoming
service restoration and TE goals they desire.
In this setting, assuming N1 has got MRD testing reachability every
second or two, MRD will be able to detect ETR-1 becoming unreachable
from some, many or all of its probing sites within a second or two.
Depending on how N1 sets up the logic, for instance to ignore 3
second glitches, but to change the mapping to ETR-2 if there is
failure to acknowledge probes for 4 seconds, MRD could change the
mapping within a few seconds, and the Ivip system propagates that
change to all ITRs which need it within another 2 or 3 seconds.
This doesn't rely on BGP or ICMP messages at all. Probably MRD would
ramp up the number of probes from one a second, to 10 a second, if one
or two second's worth were not acknowledged. That would avoid
changing the mapping due to just a few unfortunately lost probe or
acknowledgement packets in succession.
Multiple companies such as MRD would exist, so any network such as
NI would have the choice between a number of flexible, potentially
highly sophisticated, distributed systems for testing the
reachability of their network via various ETRs and for changing
the mapping accordingly.
Even if this Ivip system wasn't inherently faster than APT's
reliance on BGP, the separation of this function from the
core-edge separation scheme is an important benefit, since it
scales better than having 10,000 ITRs trying to determine
reachability to one ETR, and because networks would often want
more control over mapping than whatever could be provided in
APT's (or LISP's or TRRP's) fixed functionality for ITRs and ETRs.
- Robin
--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg