Re: [RRG] Are we solving the wrong problem?

On Feb 19, 2008, at 10:38 AM, Elwyn Davies wrote:

Hi, Mark.

I think you have come to a similar conclusion to the one described in the mail that I sent to the RAM list back in December 2006 (attached).

Since then I have been struggling with the issue of how to deal with sub-flows where the traffic is carried over multiple paths through the network.

You are contemplating a shim6/(real) sctp (i.e. not the 'interim' solution which choses 1 of n paths) solution using multiple addresses.

I was thinking about an IP layer solution which could identify the sub-flows in a useful but low cost and non-revealing way without much involvement from the sending host, but would not rquire multiple addesses at the transport level.

My best thought on this so far was for each router through which the traffic passes to take a suitable field for the packet (part of the flow label field in IPv6 would seem a good idea), hash the value with some suitable fixed but randomized value that is characteristic of the router and write it back into the packet. When the packet arrives at the destination this field would contain the signature of the path that the packet had taken. Assuming deterministic hash functions and a fixed initial value related to the exit path from the source domain that is also passed along to the destination, the 'ack' could return the values which would enable the source to determine which paths were working best.

My feeling is that this would not require a large number of bits (maybe 8 for each part, giving up to 256 exit paths/routes through the network)and there is (at least for IPv6 ) I believe an incremental deployment solution.

The solution is still very much a work in progress.

Regards,
Elwyn

Mark Handley wrote:

I've been mulling this around for a long time now, and I think we may
be trying to solve a number of problems at the wrong layer.

The routing problems we're trying to solve seem to stem primarily from
an inability to perform aggregation of routing information. There are
a number of causes for this, including:

- historical allocation policies (the swamp).
- inappropriate information hiding in BGP.
- multihoming.
- traffic engineering.

The first is not a serious problem, though may become one when we run
out of IPv4 addresses. The second is one that RRG might take a
serious look at. I've made one attempt at this with HLP, but other
solutions are certainly possible too, even without replacing BGP. But
the current concerns mostly seem to stem from multihoming and traffic
engineering.

I believe that if we are ever to make routing scale a great deal
better, we should not be attempting to solve these last two problems
primarily within the routing system. Clearly if you present routing
people with a problem, we'll come up with routing solutions, but if
you take a broader view, we can probably do much much better (Frank
Kelly deserves credit for the original idea I'm going to suggest). I'm
slowly writing a more detailed document discussing what I think the
solution space should be, but I'll try and give the general idea with
the simple example of a site that is dual homed to two ISPs.

Current practice attempts to hide this dual homing behind a single IP
address for each host. This then requires a long prefix to be
advertised via both ISPs, with appropriate AS prepending to balance
traffic. If either edge-link goes down, on average half the Internet
gets to process the routing update. Worse, this doesn't do a great
job of load balancing, so prefix splitting is sometimes performed to
better balance load, resulting in even more stress to the global
routing system. In short, attempts at improving local robustness
create global stresses, and potentially global fragility, which is the
problem we're all concerned with.

So, what happens if we stop trying to hide the multihoming. Take a
server at this multi-homed site and give it two IP addresses, one from
each provider's aggregated prefix. Now we modify TCP to use both
addresses *simultaneously* - this isn't the same as SCTP, which
switches between the two. The client sets up a connection to one
address, but in the handshake learns about the other address too. Now
it runs two congestion control loops, one with each of the server's IP
addresses. Packets are shared between the two addresses by the two
congestion control loops - if one congestion-controlled path goes
twice as fast as the other, twice as many packets go that way.

OK, so what is the emergent behaviour? The traffic self-load balances
across the two links. If one link becomes congested, the remaining
traffic moves to the other link automatically. This is quite unlike
conventional congestion control, which merely spreads the traffic out
in time - this actually moves the traffic away from the congested path
towards the uncongested path. Traffic engineering in this sort of
scenario just falls out for free without needing to involve routing at
all. And more advanced traffic engineering is possible using local
rate-limiting on one path to move traffic away from that link towards
the other. Again, this falls out without stressing routing.

Now, there's quite a bit more to it than this (for example, it's great
for mobile devices that want to use multiple radios simultaneously),
but there are also still quite a lot of unanswered questions. For
example, how much does this solve backbone traffic engineering
problems? The theory says it might. I'm working on a document that
discusses these issues in more depth. But I think the general idea
should be clear - with backwards-compatible changes to the transport
layer and using multiple aggregatable IP addresses for each
multi-homed system, we ought to be able to remove some of the main
drivers of routing stress from the Internet. That would then leave us
to tackle the real routing issues in the routing protocols.

I hope this makes some sort of sense,

Mark

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg

From: Elwyn Davies <elwynd@dial.pipex.com>
Date: December 21, 2006 9:29:00 AM PST
To: ram@iab.org
Subject: Traffic engineering and the network model

A while back
Pekka Nikander wrote:

Noel Chiappa wrote:

When I look at what the problem seems to be ..., it seems to be growth in the size of the routing table (with dynamics of routing table entries coming in second). The two main drivers of growth seem to be i) multi-homing, and ii) general entropy and lack of aggregation. Is this correct?

I've been told that traffic engineering is another major factor. However, I have to admit not understanding what people mean with it. (A good reference would help.)

Networks use traffic engineering to maximise the utility (in the economic sense) of their network assets. To do this they have to limit the accepted portion of the offered traffic and direct the accepted traffic towards its intended destination subject to a number of constraints, especially:
- Capacity and availability of plant (routers, links etc)
- Policies (business, regulatory, political, etc.)
- Contractual (customer SLAs, transit agreements, etc)

It was clear from the IAB Routing and Addressing workshop, and also from other discussions which I have had, that traffic engineering is very important to network operators and some of the ways in which it is done utilize the capabilities of the routing system (deliberate deaggregation of prefixes in particular) so that it contributes significantly to the growth of the core routing tables.

Traffic engineering only becomes 'interesting' if there are multiple, comparably effective paths between many pairs of points in the network: traffic engineering on a simple network where there is exactly one path between a pair of (end-)points degenerates into limiting the accepted traffic to the capacity of bottleneck link - the classical model for which TCP is optimized!

The core network today is increasingly 'meshy' for economic and robustness reasons, and the desirable but inexorable growth of traffic means that link capacity is increasingly well- or over-used. Accordingly traffic engineering is an essential tool for network owners and operators.

Observation 1: Multihoming is a form of traffic engineering.

On the architecture discussion list there was some limited amount of thought about the model we use for the network in the core. I would like to take this a bit further and see how the network model fits with the classical and current reality, and how it interacts with multihoming/traffic engineering.

Application View:
=================
Seen from the point of view of an application running on a node, the network *internet* layer provides, at its most abstract:
- a path for packets from 'here-to-anywhere' which is not necessarily reliable
- unconstrained capacity for delivery of packets

There are no assumptions about how the network implements the path and there are no assumptions about constraints on the size of the path. In particular
- there is no assumption about uniqueness of path (in time or topology)
- there is no specific guarantee of ordered delivery

Aside: At this level, the distinction between a 'user' application running on an end-point and the 'routing application' running at intermediate points is that the user application expects symmetric 'anywhere-to-here' connectivity whereas the routing application doesn't.

Interestingly, the transport layer may make additional assumptions about and requirements on the internet layer that are not specifically provided by the classical model of the internet layer:
- the transport layer (e.g., TCP) may assume that the network is of limited capacity
- the transport layer may make relatively strong assumptions about ordered delivery

The constraints applied by the transport layer reflect the classical model of Internet routing where there was a unique best path which had a bottleneck segment that might change relatively slowly with time, but would generally be stable during the lifetime of most application flows.

The development of the modern Internet with multihoming and traffic engineering across multiple equally capable paths appears to be well matched to the basic assumptions (no assumption of unique path, unconstrained capacity) on the internet layer but struggles to meet the constraints applied by the transport layer.

Routing View
============
The routing system in the classical model:
- assumes there is a single best path from here-to-anywhere at any given time
- does not worry about capacity constraints

Adapting the routing system to the newer model and the real world has required a number of hacks such as the TE extensions (constraining the single best path due to capacity limits) and ECMP allowing some utilization of multiple parallel paths. Overall this has not been very satisfactory and it is certainly not an architecturally pure solution. Moreover the distribution of traffic across multiple paths by ECMP typically relies on hashing the whole of the destination and maybe the source addresses and other info in the packet to ensure that the transport layer constraints on ordering are met. In many cases this splitting is very fine grained and relies on using more information than is required for routing. This may limit our ability to do information hiding in the routing system.

The deliberate disaggregation of prefixes is a response to the need to force the routing system to distribute traffic over multiple paths leading to the same destination where the specific tools are inadequate.

So what...
==========
Observation 2: The routing system today is not a very good match for the reality of the network model today. This is partly caused by a mismatch between the assumptions at the internet layer and the transport layer, and partly by the development of meshy networks.

In today's meshy network, it is not just multihomed end sites that see multiple useful routes from here-to-anywhere. Many routers in the core network could, if the routing system allowed it, also see multiple useful routes. I would therefore claim that traffic engineering and multihoming contain a common problem that does not just manifest itself at the edge of the network.

Observation 3: A good solution to the routing (scalability) problem needs to embrace the availability of (and need to use) multiple routes between both edge and interior points in the network.

The id/locator split solution which we have been looking at attempts to hide this situation at one particular locus in the network (somewhere between the end host and the first provider edge), rather than solving it generally. It would presumably leave the traffic engineering problem in the core using the existing routing based techniques.

In essence, unlike today's network, where the internet layer uses reasonably uniform routing techniques throughout and crafts traffic engineering/multihoming solutions from the same tools everywhere, the network would be partitioned, with the edge and core using different techniques to achieve the required traffic engineering.

Observation 4: Adopting multiple different solutions to the multihoming/traffic enginering problems at different places in the network is likely to lead to interactions between the solutions.

One area of interaction that I can see is the need to extract information from inner headers in core routers to correctly execute ECMP and other classification schemes if a form of encapsulation is used. This will increase the processing and memory bandwidth burden on core routers, particularly during the transition.

Observation 5: Disguising a problem rather than solving it will likely lead to needing a more complex solution in the future.

There is a significant risk that network growth and increases in core connectivity will lead to the routing table size problem reasserting itself due to the use of complex traffic engineering. The problem would thus be postponed rather than solved, and the result might be the need to combine solutions rather than using a common one.

Conclusion:
===========
Whilst it is highly likely that the id/loc split is a good idea, we shouldn't assume that it is a panacea for the multiple path problem. A uniform routing solution which manages the existence of multiple paths may well constrain the growth of the routing tables better than multiple partial solutions and provide a solution to the multihoming aspects of the core and edge problems.

In the longer term we need to look at the assumptions of the network model we are using and determine if we can modify them to make the multiple path problem more tractable.