[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RRG] Six/One Router revised 2008-07-12
- To: Routing Research Group <rrg@psg.com>, Christian Vogt <christian.vogt@nomadiclab.com>
- Subject: [RRG] Six/One Router revised 2008-07-12
- From: Robin Whittle <rw@firstpr.com.au>
- Date: Thu, 31 Jul 2008 03:25:39 +1000
- Organization: First Principles
- User-agent: Thunderbird 2.0.0.16 (Windows/20080708)
Hi Christian,
Thanks for your message of 12 July:
Six/One Router Design Clarifications
http://psg.com/lists/rrg/2008/msg01801.html
and the paper with the revised Six/One Router design:
http://users.piuha.net/chvogt/pub/2008/vogt-2008-six-one-router-design.pdf
Below are some notes and questions on the paper. I will write more
about this new design and about your replies to my 6 questions once
I have read your reply to my notes and questions.
- Robin
Short version: I have several queries or perceived problems with
the new version of Six/One Router, including:
I have concerns regarding scaling with the number
of Mapping Preference Messages to be sent when a
multihoming failure occurs in a busy network.
When a multihoming failure occurs, I am unsure how
the Six/One routers know which remote networks to
send Mapping Preference Messages to. Since these
routers are stateless, how do they know which
networks have recently been sending packets?
How does the Six/One router know which remote
Six/One router in another network to send the
Mapping Preference Message to?
How does one Six/One router find the address of
another?
I think the DNS AAAA requirements are
unworkable in the case of a hosting company with
thousands of customers requiring those customers
to update the AAAA records in their DNS for the
web server FQDNs for servers at the hosting
company, every time the hosting company gets
a new transit prefix (every time it gets a new
provider).
Does the bit 71 to 64 adjustment technique really
satisfy all the crypto protocols which look at
the header? Do they all use a simple 16 bit
checksum?
This technique seems to be the only way of making
Six/One Router work with crypto protocols, but
it seems to be incompatible with prefixes longer
than /48. Yet you mention the ability to use
prefixes as long as /128 in Mapping Preference
Messages.
Page 1
------
Note 1:
The overhead of map-encap in IPv6 can be pretty frightening when
considering 50 VoIP packets a second, each carrying 20 bytes. This
is one of the big motivating factors for trying to find a
translation scheme such as Six/One Router for IPv6 rather than
map-encap.
A standard IPv6 VoIP packet, not counting Ethernet headers (18
bytes) is:
40 IPv6
8 UDP
12 RTP
20 VoIP data 60/20 = 3:1 header to data ratio
Data rate 50 x 80 x 8 = 32,000 Bps
Ethernet rate 50 x 98 x 8 = 39,200 Bps
With Ivip: IP-in-IP encapsulation:
40 IPv6 outer
40 IPv6
8 UDP
12 RTP
20 VoIP data 100/20 = 5:1 header to data ratio
Data rate 50 x 120 x 8 = 48,000 Bps
Ethernet rate 50 x 138 x 8 = 55,200 Bps
With LISP: IPv6, then UDP, then LISP headers:
40 IPv6 outer
8 UDP
8 LISP
40 IPv6
8 UDP
12 RTP
20 VoIP data 116/20 = 5.8:1 header to data ratio
Data rate 50 x 136 x 8 = 54,400 Bps
Ethernet rate 50 x 154 x 8 = 61,600 Bps
Your note about overhead being 400% is not unreasonable, but it
might be good to give a specific example, such as the LISP one.
This expands an originally 8:1 compressed datastream almost back to
the original 64,000 Bps rate.
Page 2
------
Para 1, line 5:
As another disadvantage, proxies may prolong the path of
packets because they are usually off the shortest path.
"Proxies" in this context means LISP Proxy Tunnel Routers or Ivip
OITRDs (Open ITRs in the DFZ.) I wouldn't say they are "usually"
off the shortest path, since this is not necessarily the case. If
they are at major Internet exchange points, they will often be on
the shortest path.
And finally, yet importantly, the proxy concept
lacks convincing deployment incentives since the cost for
deploying and operating the new infrastructure must be borne
by providers that obtain little benefit from it.
This may be the case for the LISP vision, but not for Ivip, where
the OITRDs are to be deployed by the organisations who rent Ivip
mapped address space to end-user networks. A summary of the LISP
PTR debate and a pointer to my OITRD business case message are in a
recent message:
Business incentives for LISP PTRs and Ivip OITRDs
http://psg.com/lists/rrg/2008/msg02021.html
Col 2 para 5
Practically, however, Six/One Router will likely be used
only with transit addresses from IP version 6: The
one-to-one mapping between edge addresses and the transit
addresses from a given provider consumes a high number of
transit addresses, which will prospectively be unavailable
in IP version 4.
Thanks for clarifying this: Six/One Router is not a contender for
the IPv4 scalable routing solution, unless using IPv6 as its transit
network.
Page 3
------
2.2 Address Rewriting: mapping record
I think it would be helpful if you provided an example mapping
record, showing how the edge prefix is specified, and how the one or
more transit prefixes are specified, with any TE information and
anything concerning the Six/One routers deciding where to send
packets in multihoming failure events. Maybe Six/One Router doesn't
have such things, which are used in LISP, APT and TRRP. If so, it
might be good to state this explicitly.
My best guess is that a mapping record looks like this:
128 bits Edge prefix address.
7 bits Edge prefix length.
8 bits Number of transit prefixes.
128 bits Transit prefix address 0.
128 bits Transit prefix address 1.
etc. etc.
All the mapping systems you list are "slow" - in that they do not
attempt to give the end-user real-time control of the behaviour of
the ITRs, or in this case Six/One routers. This means your system
has to figure out for itself how to cope with multihoming failures.
(Ivip is different. The fast push mapping system enables end-user
networks to control the mapping in real-time, so they do their own
failure detection and make their own decisions, completely removing
these things from the map-encap system.)
Page 4
------
Figure 3
The first question which came to my mind with this is whether
Six/One Router could be implemented with a single router, with two
links - one to each provider. Later in the paper it becomes clear
that you rely on this two router arrangement, together with the
internal routing system, to determine which link packets go out on -
and therefore which transit address they are sent from.
If you do rely on two routers, how is this to work if they are both
in the same room, and effectively in the same part of the local network?
How would you respond to end-users who didn't want to buy a router
and locate it somewhere different in the network for every upstream
link they used for multihoming?
2.3.2 Traffic Redirection
This section concerns Traffic Engineering (TE) and Multihoming
Service Restoration (MSR).
Outgoing TE for load balancing, and outgoing MSR (choice of outgoing
link due to failure of another link) is achieved by the internal
routing system somehow adapting to the TE needs and the link failure
conditions to direct packets to one or the other of the Six/One routers.
Incoming TE and MSR requires affecting the behaviour of Six/One
routers in however many provider networks as are currently sending
packets to this edge network.
I found this sentence (middle of left column) impossible to fully parse:
A Mapping Preferences message can be returned to any source
transit address from a packet received from a remote edge
network.
The "froms" were my sticking points.
Maybe:
A Mapping Preferences message can be returned to any source
transit address in response to a special packet received from a
remote edge network.
I am being pernickety, not least because I think that your paper is
generally written with extreme clarity and with excellent expression
sensibilities.
You later explain the three packets of the Mapping Preference
Message exchange. I think this initial explanation would be better
if it was expanded a little.
Mapping Preferences messages list combinations of address
mappings and preference values. Similar to mapping records,
the address mappings in Mapping Preferences messages are
pairs of edge and transit address prefixes. But unlike mapping
records, address mappings can be specified with variable
granularity by scaling the length of their prefixes: Since edge
addresses map one-to-one onto the transit addresses from any
particular provider, the edge and transit address prefixes in an
address mapping have the same length. Scaling this length
facilitates preference feedback at granularities ranging from an
edge network’s entire edge address space – in which case the
edge address prefix is a complete routing prefix – to a single
edge address. Address mappings are allowed to have
overlapping edge address prefixes. To exclude ambiguities,
those with longer edge address prefixes take precedence over
those with shorter edge address prefixes.
Some examples would be really helpful.
Do you really want to have Six/One routers fussing over individual
destination edge IPv6 addresses when they decide which transit
address to send the packet too?
128 bits is a lot of bits to chew through with some CPU- and
DRAM-intensive algorithm on a packet-by-packet basis.
My plan with IPv6 Ivip is to limit the granularity of the mapping
system and the ITR functionality to /64. That is bad enough, but
your description above indicates that you want all Six/One routers
to be engineered to potentially match all 128 bits of a destination
address of some packet arriving from its local network, to the
longest matching prefix in a potentially lengthy Mapping Preference
Message.
Presumably a Mapping Preference Message takes precedence over
whatever mapping information a Six/One Router may have received.
I think you need some kind of time-out on these Mapping Preference
Messages. Otherwise, the Six/One router would be required to honour
it forever, no matter what mapping information arrived. Say some
network sent out a spurious Mapping Preference Message. How could
the recipient Six/One router later know this was spurious, or that
some corrective Mapping Preference Message was not received? That
Mapping Preference Message could cause the remote Six/One router be
sending packets to some other networks, so the Six/One router which
sent the now unwanted Mapping Preference Message wouldn't know there
was a problem.
I think this section needs a clear explanation of multihoming
failure detection, decision-making and of how all the sending
networks are told, presumably via Mapping Preference Messages, to
change which transit address they use.
Let's say in Figure 3, the mapping and any currently active Mapping
Preference Messages have the effect of causing all correspondent
networks (all upgraded networks, since non-upgraded networks do not
participate in multihoming) to send all packets to transit address
2000::/48 - via Provider 2.
Here are some fault conditions:
A - The link to Provider 2 fails.
B - The router with the link to Provider 2 fails.
C - Provider 2 itself is cut off from the Net, or is
has severe congestion.
Where are these conditions detected?
Where is a decision made about which of potentially multiple other
links and transit addresses should be used?
How is that decision turned into the only possible response: sending
Mapping Preference Message to the Six/One routers in every upgraded
network which is currently sending packets to this network?
In all cases, the messages need to be sent out of a link other than
the one which has failed.
In case A, there needs to be a way the two or more Six/One routers
in the local network can communicate, make a decision and take the
chosen action.
In case B, the surviving one or more Six/One routers need to
recognise the one linking to Provider 2 is dead, and likewise make a
decision and take action.
How would the Six/One routers decide that condition C had occurred?
Assuming a decision was made, how would the routers know which
upgraded networks to send Mapping Preference Messages to? They
can't reasonably be expected to keep a record of recent traffic. If
they were expected to, then how long would they need to keep such
records?
What about really busy sites which are receiving packets from tens
of thousands of upgraded networks? That would require sending
Mapping Preference Messages to every such network.
When you send a Mapping Preference Message from Provider 1, using
1000:/48 you don't address it to a particular upgraded network, but
to the Six/One router which was sending the packets to this edge
network in recent times.
How does the Six/One router know which remote Six/One router to send
the message to?
How does one Six/One router find the address of another?
In the following diagram, there are two upgraded sites, both with
2-way multihoming. They use four separate providers. X and Y are
/48 edge prefixes - allocated permanently to the edge network, and
not in the BGP global routing table. So X and Y are prefixes in the
the new, scalable form of space which is Provider Independent, but
perhaps best not referred to as "PI" space, since this already has a
specific meaning.
A, B, C and D are transit prefixes, all /48, which are PA (part of
some shorter prefix allocated PI to each provider) and which
therefore appear in the BGP global routing table.
There are four Six/One Routers: SOR1, SOR2, SOR3 and SOR4
Edge-1 | Provider 1 DFZ Provider 3 | Edge-2
| |
SOR1----------A[---------]C-----------SOR3
| \ / |
X[ | / | ]Y
| / \ |
SOR2----------B[---------]D-----------SOR4
| |
| Provider 2 Provider 4 |
Edge-1 is an upgraded network which in this example is the only one
sending packets to Edge-2. There could be tens or hundreds of
thousands of such networks sending packets to one or more hosts in
Edge-2 when Edge-2 has a failure.
In this example, Edge-1 is sending all its packets to Edge-2 from
its A transit address, to the C transit address.
Let's say SOR3 fails, or its link to Provider 3 fails, or Provider 3
fails or becomes very congested.
Somehow, SOR4 has to send a Mapping Preference Message to SOR1.
But how does it know that one or more hosts in Edge-1 have been
sending packets to one or more hosts in Edge-2? The packets didn't
go through SOR4.
Even if SOR3 kept records of traffic - which I think is unworkable -
lets say SOR3 is dead.
I don't see how you can base Multihoming Service Recovery on any
active decisions and messages emanating from the Edge-2. It might
be OK for some failure modes, but not for all.
Ivip involves the the end-user (whoever operates Edge-2 in this
example) setting up their own multihoming monitoring and decision
making system. They could do this themselves, and they could base
it inside or outside their network, but the most likely arrangement
is for them to hire the services of some company which does this
sort of thing. That company has a 100% robust distributed global
network of servers, and it constantly monitors connectivity to
whatever ETRs Edge-2 relies on, and through them connectivity to
whatever internal routers etc. need to be monitored. This way, the
company can detect any failure, entirely from outside Edge-2's
network, and then change the mapping system for Edge-2's micronet(s)
accordingly.
In the Ivip system, the monitoring company doesn't need to know what
networks have been sending packets to Edge-2. The mapping change
affects changes the behaviour of all the world's ITRs which are
handling these packets, in a few seconds.
Since multihoming monitoring and decision making is completely
outside the Ivip system, there can be innovation, any number of
techniques used, all sorts of customised arrangements involving
probe packets, secure arrangements etc. to any depth in the networks
being monitored, from multiple vantage points in the outside world.
With the other map-encap systems, the ITRs are expected to do the
failure detection and decision making, based on previously supplied
options in the mapping data. They need to do this individually. It
all needs to be specified as part of the map-encap system, and so
can't be upgraded easily, or customised at all.
Six/One Router is similar to these non-Ivip map-encap systems: you
monolithically build in multihoming failure detection,
decision-making and the actions required to change the path of packets.
The only way any network can change the path of packets is by
sending Mapping Preference Messages to all the particular Six/One
routers which are currently sending packets which need to be redirected.
It is no good sending the Mapping Preference Message to SOR2, since
it is not sending those packets, and since you have no way of SOR2
communicating the contents of such messages to SOR1 or however many
other Six/One routers there are in Edge-1.
Of course, if Edge-2 was getting packets sent to SOR3 from both SOR1
and SOR2, then SOR4 would need to send a separate Mapping Preference
Message to both SOR1 and SOR2.
But again, how can SOR4 know where the packets have been coming from?
It is not good enough to rely on SOR1 getting destination
unreachable messages when the failure occurs. Maybe SOR3 simply
dies in a way that the link stays open, but it can't communicate
with the rest of Edge-2.
The non-Ivip map-encap systems tend to rely on destination
unreachable messages for their ITRs to figure out something is wrong.
Ivip doesn't rely on such things. A properly engineered multihoming
monitoring system will securely probe and get explicit responses
from routers, internal nodes or whatever is desired to show that the
links, routers etc. are working as expected. When these positive
acknowledgements fail to arrive for more than a specified time, the
multihoming monitoring system decides there has been a failure.
Also, the multihoming monitoring system is in a much better position
than the ITRs in the other map-encap systems to keep probing the
network to detect when the failure has been resolved. Then it can
change the mapping back to what it was. Arbitrarily complex
detection and decision techniques can be used with Ivip, since it is
nothing to do with Ivip itself. The other map-encap systems, and
your own Six/One Router, are monolithic and have to specify every
technique for multihoming monitoring, decision making, recovering to
normal operation. Also, all such functionality needs to be built
into all ITRs and ETRs, or into all Six/One routers and the local
routing systems which largely control their operation.
Ivip doesn't rely on anything being done by the network in question.
A properly engineered multihoming monitoring system will be
entirely independent of that network, and will securely change that
network's micronet's mapping in a way the operators desire.
I have further questions below about the Mapping Preference Message
system.
Page 4 continued
----------------
2.4 Backwards Compatibility
You make the hosts in the edge network reachable from non-upgraded
networks (AKA "legacy" networks - but I dislike this term) via their
one or more transit addresses. However, all such traffic is not
subject to any multihoming service restoration system. That only
works for packets from upgraded networks.
A primary purpose of adopting a map-encap system, or Six/One router
- whatever it is, with its new type of address space which will
solve the routing scaling problem - is to have multihomable,
portable, space, ideally space which can be used for incoming TE as
well.
Yet with Six/One Router the multihoming and TE functions only work
for packets coming from upgraded networks. This means there is very
limited motivation for anyone to adopt Six/One Router space
initially - a situation which is likely to persist indefinitely
unless there are other motivating factors sufficient to cause
widespread adoption.
In contrast, LISP with PTRs and Ivip with its OITRDs provides full
multihoming and incoming TE for traffic from non-upgraded networks -
so the impetus to adopt these is high, right from the start, even
before a single other network has adopted it.
Your backwards compatibility system has to cope with various
scenarios. In one scenario - the correspondent host in the
non-upgraded network initiating the communication, including sending
a single packet - your system relies entirely on the correspondent
host getting the one or more transit addresses for the upgraded
network from the DNS AAAA record, and then finding and using one of
these, after potentially trying one (or more?) edge network
addresses which are not routable in the global BGP system.
Each host in the upgraded edge network has no idea of what its
address would be in the one or more transit prefixes that network is
accessible by. (To do so would involve an impractical involvement
of hosts with the local routing system, how the network organises
its links to providers, which of those links is currently active and
preferred etc.)
So a host in Edge-2 above only knows its address in prefix Y.
In the scenario in which the communication is initiated by the host
in Edge-2, that host can send packets to some host in Net-13, which
hasn't been upgraded to Six/One Router yet. It can tell the host in
Net-13 its address in Y, but that will not enable the host in Net-13
to send packets to it.
In this scenario, the only way a host in Net-13 could send packets
to the host in Edge-2 is by using the source address in the packet
it received from the host in Edge-2. This address would be a
transit address, in C or D.
There are other scenarios:
There is no obvious way some other system (such as a P2P management
system) could tell the correspondent host in Net-13 an address on
which it could send a packet to the host in Edge-2. Edge-2 could
tell that management system its Y address, and perhaps the
management system could be specially crafted to observe the source
address from which packets from the Edge-2 host arrived. But this
is a flaky and irregular way for an external system to figure out
what address to tell the correspondent host an IP address to use to
send packets to the Edge-2 host.
Host's don't know - and shouldn't have to know - whether or not they
are in a Six/One Router edge network. They shouldn't have to know
whether their own address, or the address of other hosts, are "edge"
addresses or "transit" addresses.
Nor is it reasonable to expect any separate management system to
make these distinctions.
Ivip and LISP with PTRs lets the hosts carry on as usual. All hosts
can send packets to each other on their own addresses, no matter
whether one or both hosts are on Ivip/LISP-managed addresses.
Page 5
------
2.4.1 Destination Address Selection
I found dot point 1 hard to understand at first. Perhaps it would
be better to write:
To organise the addressing system so that all edge addresses are
from a clearly identified prefix, such as 1::/1, 11::/2 or
111::/3.
I found the rest of the left column pretty hard to understand.
This was very confusing at first, but it made more sense with later
explanation:
In case of a tie, a candidate destination address is chosen that
has the longest prefix in common with the source address.
I couldn't imagine what the purpose of this was.
The following sentence could be rewritten to change "choose" into
something more informative. I thought it referred to some algorithm
in a host choosing something. In fact it refers to the designers of
the entire system choosing to make a whole section of the IPv6
address space exclusively for edge networks - and not to have edge
network addresses outside this section.
A means to take maximum advantage of the longest-prefix
match for destination address selection in Six/One Router
would be to choose the highest-order bit in addresses so that it
distinguishes edge from transit addresses.
The rest of this column could probably be rewritten to be less
confusing. I won't try to detail my difficulties here, but can talk
about them by phone or write more offlist.
The top (continuing) paragraph on the right column is pretty
confusing to me.
You propose some kind of overall prefix to contain all edge address
space, but admit it is not a reliable approach. I don't fully
understand the previous explanation of how it would work with your
proposed address selection mechanism.
To what extent are you proposing a change to all hosts in the way
they select an address from multiple addresses in an AAAA DNS record?
I foresee major problems with this reliance on DNS to enable hosts
in non-upgraded networks to send packets to hosts in upgraded networks.
Firstly, there are plenty of situations where a host needs to be
told an IP address to use, by some system other than a FQDN and a
DNS lookup.
Even ignoring those instances, lets consider this example which uses
DNS.
Edge-2 is a hosting company. It has a customer xyz.com, who run
their own nameservers. The web server www.xyz.com is on a host in
the Edge-2 network. xyz.com needs to put an AAAA record in their
DNS so hosts all over the world can send packets to their web server.
This is no problem with Ivip, LISP etc. However I see serious and
probably insurmountable problems with Six/One Router.
You need xyz.com to have not just the Y prefix edge address of the
server for www.xyz.com in their AAAA record, but every address on
which that host would appear on each of Edge-2's transit prefixes: C
and D.
Lets say Edge-2 has 10,000 such customers. I know it is traditional
for many hosting companies to do the DNS for their customers'
domains, but this doesn't work for all customers, so I will assume
all 10,000 customers run their own DNS, or have someone else run it.
Whenever Edge changes one or more of its providers, it gains or
loses a transit prefix such as C or D. Each time Edge-2 does this,
it needs to get all its 10,000 customers to change the AAAA record
for their web server in their DNSes!
This is unworkable, and looks to me like a showstopper for Six/One
Router.
I am unlikely to accept arguments about why this is not a realistic
example. For instance some may folks argue that such a hosting
company is not the sort of edge network which would want, or should
have, Six/One Router managed address space.
We need the new scalable type of address space to be ubiquitously
adopted. It is not good enough to get 50% of end-user networks
using it, with the other 50% (however defined, such as by the total
number of addresses used, the number of prefixes they use etc.) not
using it, since at most that will only cut the scaling problem by a
factor of 2.
In order to provide the millions (some insist billions) of end-user
networks with portable, multihomable address space, we need the new
type of address space to be highly attractive to *all* end-user
networks. This means all networks, except those of providers - all
networks of any organisation except of those organisations who sell
connectivity.
So its not good enough to say hosting companies won't be using the
new kind of address space.
Nor is it good enough to say that the very large hosting companies,
in which the above problem is most acute, wouldn't need to use the
new kind of space. (Arguably, there would be few enough of these
that we could cope with them using conventionally BGP managed space
in perpetuity.)
We need all hosting companies, large and small, to want to use the
new kind of space. If there is a perception that the new kind of
space is not suitable for the largest hosting companies, then every
start-up company will insist on using conventional space, because
they are sure they are going to grow into a large hosting company
real soon.
It is bad enough Edge-2 having to change all its DNS records every
time it gets a new transit prefix, but I think it is unworkable for
Six/One Router to require such DNS changes in organisations which
are separate from Edge-2, but in some way involved with the hosts in
Edge-2's network.
At the end of 2.4.1, I have a rough idea of what you are suggesting
about using existing NAT traversal approaches to cope with the
problems inherent in the current design for Six/One Router. I think
a fuller explanation, with examples, would be helpful.
Pages 5 and 6
-------------
2.4.2 Source Address Consistency
I found this sentence confusing:
The mode in which a packet is exchanged can consequently be
determined based on the type of remote address as long as the
packet is within the boundary of an edge network: The packet is
exchanged in Bilateral mode if the remote address is an edge
address.
"within the boundary of an edge network" sounded like the physical
location of a packet at some point in its travels. In fact, I think
it means whether its source address can be determined as being
within an edge network.
I had to re-read this whole section carefully. I don't think I
fully understand it, but I surmise:
1 - This section only concerns communications between hosts which
are both in upgraded networks. So the sentence:
With this invariant, every packet exchange between two
hosts must be in either of two modes:
* Bilateral mode — both hosts in the packet exchange use
edge addresses to reach each other.
* Unilateral mode — both hosts in the packet exchange use
a transit address to reach each other.
needs to be understood as not referring to the Unilateral
arrangement used for communicating with a host in an
non-upgraded network (Figure 4), but only to the situation of
both hosts being in upgraded networks: Figure 2 (Bilateral)
and Figure 5 (Unilateral).
BTW, I think these terms are confusing, since Figure 3 is
perfectly symmetrical in terms of both sides doing the same
thing. The Uni/Bi-lateral concept refers to whether incoming
packets from the provider are rewritten or not. Outgoing
packets are always rewritten.
2 - Each Six/One router can't figure whether or not the source
address of a packet arriving on the provider link is an
edge address or not. (But on the page before, was a plan
to arrange the addressing system so edge addresses always
came from a defined short prefix, and so could be identified
as such by some simple algorithm. I don't understand how
these two concepts relate to each other.)
3 - In order to solve this, there are two approaches.
a - The preferred approach is to use a bit in the IPv6 header
to indicate this. A set state for this
"Bilateral/Unilateral" bit means that this is a
"Bilateral" exchange, so the source address of the
incoming packet should be rewritten (always to an edge
address?) before the packet is sent to the local
destination host.
b - (Note 5) An alternative to the new header bit is to use
an option header, but this raises all sorts of problems
with DFZ routers not handling such packets efficiently,
and with the packet becoming longer, and different.
This would be inefficient, raise PMTUD problems, and
generally nullify Six/One Router's greatest attraction
over map-encap schemes: that the packets do not get any
longer.
I think it would be good to have more discussion of this new bit
being used in the IPv6 header. Without it, Six/One Router is not
going to work at all - since the option header alternative means it
probably can't work with current DFZ routers, and would lose most of
its attractiveness over map-encap.
The most likely place for such a bit is the Flow Label, which
according to the Wikipedia and:
http://www.tcpipguide.com/free/t_IPv6DatagramMainHeaderFormat.htm
is not used at present.
Pages 6 and 7
-------------
2.4.3 Multi-homing Support
I had to re-read this section too.
The Unilateral mode referred to here is the Unilateral mode between
hosts in two upgraded networks: Figure 5 - NOT the Unilateral mode
between a host in an upgraded network and a host in a non-upgraded
network (Fig 4.)
I got completely lost here:
Six/One Router achieves this by making the
providers of a multi-homed edge network responsible for
connectivity to disjoint and complementary subsets of the
transit address space, while having all of them provide
connectivity to the complete remote edge address space.
Providers back up each other’s routes to remote transit
addresses.
and the paragraphs which follow.
Even without understanding the foregoing, I have some understanding of:
Finally, to enable fast re-establishment of packet exchanges in
Unilateral mode after a provider failure, Six/One Router must
meet the following two requirements:
* Backup routes for the defunct ones must be available
quickly in the edge-network-internal routing system.
* Backup Six/One routers must be aware of the fail-over so
that they can start accepting incoming packets that used to
be bound to the failed provider.
To achieve the first, providers offer backup service for the
routes to remote transit addresses that other providers are
responsible for.
While I don't understand this, does it mean something do to with
provider 1 advertising in the DFZ a prefix which was until recently
advertised by provider 2? That doesn't sound desirable or
practical, but it is the most sense I could make out of the section
to this point.
Or does it just mean that if the provider 2 link (Figure 6) fails,
that provider 1 will accept (and forward to the DFZ) packets from
the Six/One router on the left, which have a source address in the
2000::/48 prefix?
The second last paragraph of this section does explain something
about multihoming monitoring decisions and how those decisions lead
to the desired flow of packets from another Six/One router, link and
transit address. However, this is just for the scenario in which
the link to provider 2 is dead, and the Six/One router on the right
recognises it.
What if that router is dead? Or what if provider 2 is disconnected
from the Net, or is so congested as to make this link incapable of
handling the traffic?
Page 7
------
2.4.4 Avoiding Adverse Effects of Unilateral Mode
on Transport Protocols and Applications
Again, I think this concerns Unilateral mode for hosts both in
upgraded networks, not Unilateral mode with one host in a
non-upgraded network.
Applications are affected if they reference addresses in packet
payloads because unilateral address rewriting in the IP header
of a packet then leads to address inconsistencies between the IP
header and the packet payload. Six/One Router relies on
application functionality for network address translator
traversal [Ro2003, Ro2007] to avoid such address
inconsistencies.
These references are to:
STUN http://tools.ietf.org/html/rfc3489
ICE http://tools.ietf.org/html/draft-ietf-mmusic-ice-19
http://www.rfc-editor.org/queue.html#draft-ietf-mmusic-ice
(Last updated 2007-10-29)
It would be good to discuss how STUN and ICE, which are meant to
work with conventional NAT, would work with Six/One Router.
The following text seems to indicate a showstopper:
Applications that reference addresses in packet
payloads depend on this functionality already today, due to the
existing deployment of network address translators. It is hence
safe to assume that those applications, which use addresses in
packet payloads, also support network address translator
traversal.
This seems to assume the hosts in the upgraded networks are clients.
However, the new address space for the scalable routing solution
absolutely needs to support servers. I don't see how servers can
work if there is any requirement for hosts in the new space to use
STUN or ICE in any way.
Page 7
------
Para 3 and beyond in right column: Header checksums
The system absolutely has to work with existing cryptographic
techniques.
An alternative to re-computing Internet checksums in Six/One
Router is to choose the mapping between edge and transit
addresses such that the checksum does not change during
address rewriting. This technique is applicable to all packets,
even if their payloads are integrity-protected or encrypted. It
is also efficient because the checksum does not have to be
localized within a packet. Mapping edge and transit addresses
such that the checksum does not change during address
rewriting is practical where the routing prefixes of IPv6 edge
and transit addresses are at most 48 bits long: The remaining 16
or more bits in the standard 64-bit subnet prefix of an IPv6
address can then be used to compensate for the checksum
difference that rewriting of the routing prefixes alone would
create.
There is a potentially serious problem here:
Since you must support crypto on all packets, and since this
technique (if it works) is the only one available, it restricts
the granularity of the use of Six/One Router to prefixes of 48 or
less.
That doesn't seem to be a big problem to me, but in other parts
of the paper you discuss granularity down to single IP addresses
(/128) and using finer granularity prefixes (presumably longer
than /48) in Mapping Preference Messages, I think to spread
load over multiple incoming links.
Do all these crypto arrangements involving headers simply treat the
source and destination addresses as checksums modulo 16 bits? I
haven't checked this, but it doesn't sound very secure. It would be
good for you to list the crypto arrangements you have investigated,
and point out why you are sure they would be unaffected by the
technique you propose:
More specifically, the difference (delta) between the
checksum of an edge address routing prefix and the checksum
of a corresponding transit address routing prefix is the value
by which the lower 16 bits in the subnet prefix must be adjusted
during address rewriting to avoid changing the checksum of the
packet. 16 bits are sufficient for this because the checksum,
too, is 16 bits long. And since the routing prefixes are static,
so is (delta).
So does this mean something like the following:?
Host A has the edge address 4000::1.
It is in a network using a transit prefix 6000::/48
So when a packet sent from this host has its source address
rewritten, it would (without the above arrangements to keep the
crypto protocols happy) be rewritten to 6000::1.
Since this bumps up the (assumed) 16 bit checksum by 2000 Hex,
the above workaround actually rewrites the address with a new
value in the bits 65 to 71 positions, to subtract 2000 from
the checksum.
(I am using ordinary binary order here.)
So the addresses are:
1
2 7 7 6 6
7 2 1 5 4 0
Edge address 4000 0000 0000 0000 0000 0000 0000 0001
Ordinarily rewritten
address 6000 0000 0000 0000 0000 0000 0000 0001
Rewrite with crypto-
friendly workaround: 6000 0000 0000 E000 0000 0000 0000 0001
This clearly needs to be implemented for all Six/One Router rewrites.
I am not sure how it could work with edge and therefore transit
prefixes longer than /48.
It messes up the conceptually clean idea of simply translating a
linear range of addresses into some other linear range.
This business of always rewriting bits 71 to 65 of the destination
and source addresses in order to adjust the header checksum to keep
crypto protocols happy . . . this algorithm needs to be applied to
all the transit addresses provided in AAAA records.
Unless you can show that all relevant crypto protocols would be
happy with this workaround, I think the header checksum problem is a
showstopper.
Page 8
------
Delays inherent in relying on mapping information
In the first para in 2.5.1:
Six/One Router relies on the trustworthiness of the mapping
system to ensure that remote edge and transit addresses are
rewritten correctly. Six/One routers can rewrite the destination
edge address of a packet that leaves their edge network only
after retrieving the corresponding mapping record from the
mapping system. And they can rewrite the source transit
address in a packet that enters their edge network only after
retrieving the corresponding mapping record from the mapping
system.
Assuming the two hosts are in two upgraded networks, the Six/One
routers are using Bilateral rewriting, and these routers have no
cached mapping information for the relevant prefixes, then this
means that before a packet sent by host A will reach host B, the
following has to occur in sequence:
1 - A's Six/One router needs to request the mapping information
for the edge prefix in which the destination address (B's
edge address). (Request type F below.)
2 - That request needs to be forwarded to a query server of
some kind which can respond authoritatively and in a manner
which the router can authenticate as being secure.
3 - The response needs to be forwarded back to A's Six/One router.
4 - That router rewrites the address and forwards the packet
towards the DFZ.
5 - When it arrives at the border of B's network, B's Six/One
router somehow determines how to request the mapping it needs
to rewrite this packet's destination address to be the desired
outcome: B's edge address - and to rewrite the source address
to the desired outcome, A's edge address.
The first part is easy. B's Six/One router knows the transit
prefix the packet was received within and the edge prefix, so
it can reverse the rewrite done by A's Six/One router,
including reversing the alteration of bits 71 to 65 (crypto
header checksum workaround).
However, to rewrite the source address correctly, it needs
mapping information.
How does B's Six/One router figure out the edge prefix of
A's network? All it has is some source address, which
was rewritten to be in one of the potentially numerous
transit prefixes used by A's network.
So I think the mapping query server has to handle two types of
query:
F - Given an edge address, return the length and base
address of the the edge prefix of that network, as
well as the start addresses of the one or more transit
prefixes used by that network.
R - Given a transit address, return the starting address
of that transit prefix, and the edge prefix (starting
address and length) for the network which is using
this transit prefix. (Maybe also return this network's
other transit prefixes?)
For the moment, lets not even think about how the system
handles multihoming outages, Mapping Preference Messages
etc.
Is this correct? I don't recall two types of mapping request
being mentioned in your paper.
So B's Six/One router generates a type R mapping request.
6 - The mapping request reaches the appropriate query server, and
it sends the response.
7 - The response arrives at B's Six/One router. This enables it
to know the edge prefix and transit prefix used by the A's
Six/One router. Therefore it can calculate what was added
to A's edge source address to create the transit source
address in the just-arrived packet. This enables it to
reverse that rewrite. The rewritten packet is now
forwarded to host B.
This is two sets of query response:
* Query
* Response
Send translated packet
Translated packet arrives
* Query
* Response
Translated packet delivered
If you are using these mapping systems (top right of page 3):
CONS
ALT
DNS Map (I think)
then these all involve a global query server system. This raises
problems with long delays and unreliability in getting a mapping
response.
With Six/One Router, the situation is worse than for the map-encap
systems, since there are two cycles of query and response.
I think these mapping systems would mean that address space relying
on Six/One Router would involve unacceptable delays in delivering
initial packets as discussed here:
http://www.firstpr.com.au/ip/ivip/lisp-links/#long_paths
Host B now wants to send a packet back to host A, and this packet
happens to go through the same Six/One router just mentioned. That
Six/One router retains no state based on the previous packet
rewriting, but it does have cached mapping information of A's edge
prefix.
How does this Six/One router in B's network know which of
potentially multiple transit prefixes used by A's network should be
used for the rewriting of the destination address of the current
outgoing packet?
Assuming the type B response included all of A's transit prefixes,
then this gives the Six/One router a list of such prefixes to choose
between. Maybe, like with the multiple transit addresses in the
AAAA record, the Six/One router simply chooses one. Or does the
mapping information include weighs for each transit prefix to
implement incoming load balancing for the remote network? If so,
then this doesn't apply to packets coming in response to finding a
transit address in an AAAA record.
Once the Six/One router has chosen a destination transit prefix to
translate the packet's destination address to, it can do the
rewrites and send the packet on its way.
What of the next packet destined for the same edge prefix which
arrives at this Six/One router? Does it go through the same
procedure, potentially choosing another destination prefix in the
remote network?
There is no state in the router concerning packets handled, so I
guess there is no continuity in Six/One router behaviour from one
packet to the next.
I haven't yet looked at how Six/One Router handles PMTUD and packet
too big messages in the translated portion of the path to the
destination host.
--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg