[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RRG] Six/One Router revised 2008-07-12

To: Routing Research Group <rrg@psg.com>, Christian Vogt <christian.vogt@nomadiclab.com>
Subject: [RRG] Six/One Router revised 2008-07-12
From: Robin Whittle <rw@firstpr.com.au>
Date: Thu, 31 Jul 2008 03:25:39 +1000
Organization: First Principles
User-agent: Thunderbird 2.0.0.16 (Windows/20080708)
Hi Christian,

Thanks for your message of 12 July:

  Six/One Router Design Clarifications
  http://psg.com/lists/rrg/2008/msg01801.html

and the paper with the revised Six/One Router design:

http://users.piuha.net/chvogt/pub/2008/vogt-2008-six-one-router-design.pdf

Below are some notes and questions on the paper.  I will write more
about this new design and about your replies to my 6 questions once
I have read your reply to my notes and questions.

  - Robin



Short version:   I have several queries or perceived problems with
                 the new version of Six/One Router, including:

                 I have concerns regarding scaling with the number
                 of Mapping Preference Messages to be sent when a
                 multihoming failure occurs in a busy network.

                 When a multihoming failure occurs, I am unsure how
                 the Six/One routers know which remote networks to
                 send Mapping Preference Messages to.  Since these
                 routers are stateless, how do they know which
                 networks have recently been sending packets?

                 How does the Six/One router know which remote
                 Six/One router in another network to send the
                 Mapping Preference Message to?

                 How does one Six/One router find the address of
                 another?

                 I think the DNS AAAA requirements are
                 unworkable in the case of a hosting company with
                 thousands of customers requiring those customers
                 to update the AAAA records in their DNS for the
                 web server FQDNs for servers at the hosting
                 company, every time the hosting company gets
                 a new transit prefix (every time it gets a new
                 provider).

                 Does the bit 71 to 64 adjustment technique really
                 satisfy all the crypto protocols which look at
                 the header?  Do they all use a simple 16 bit
                 checksum?

                 This technique seems to be the only way of making
                 Six/One Router	work with crypto protocols, but
                 it seems to be incompatible with prefixes longer
                 than /48.  Yet you mention the ability to use
                 prefixes as long as /128 in Mapping Preference
                 Messages.



Page 1
------


Note 1:

The overhead of map-encap in IPv6 can be pretty frightening when
considering 50 VoIP packets a second, each carrying 20 bytes.  This
is one of the big motivating factors for trying to find a
translation scheme such as Six/One Router for IPv6 rather than
map-encap.

A standard IPv6 VoIP packet, not counting Ethernet headers (18
bytes) is:

    40     IPv6
     8     UDP
    12     RTP
       20  VoIP data    60/20 = 3:1 header to data ratio
                            Data rate 50 x 80 x 8 = 32,000 Bps
                        Ethernet rate 50 x 98 x 8 = 39,200 Bps


With Ivip: IP-in-IP encapsulation:

    40     IPv6 outer
    40     IPv6
     8     UDP
    12     RTP
       20  VoIP data    100/20 = 5:1 header to data ratio
                            Data rate 50 x 120 x 8 = 48,000 Bps
                        Ethernet rate 50 x 138 x 8 = 55,200 Bps


With LISP: IPv6, then UDP, then LISP headers:

    40     IPv6 outer
     8     UDP
     8     LISP
    40     IPv6
     8     UDP
    12     RTP
       20  VoIP data    116/20 = 5.8:1 header to data ratio
                            Data rate 50 x 136 x 8 = 54,400 Bps
                        Ethernet rate 50 x 154 x 8 = 61,600 Bps

Your note about overhead being 400% is not unreasonable, but it
might be good to give a specific example, such as the LISP one.
This expands an originally 8:1 compressed datastream almost back to
the original 64,000 Bps rate.



Page 2
------

Para 1, line 5:

    As another disadvantage, proxies may prolong the path of
    packets because they are usually off the shortest path.

"Proxies" in this context means LISP Proxy Tunnel Routers or Ivip
OITRDs (Open ITRs in the DFZ.)  I wouldn't say they are "usually"
off the shortest path, since this is not necessarily the case.  If
they are at major Internet exchange points, they will often be on
the shortest path.

    And finally, yet importantly, the proxy concept
    lacks convincing deployment incentives since the cost for
    deploying and operating the new infrastructure must be borne
    by providers that obtain little benefit from it.

This may be the case for the LISP vision, but not for Ivip, where
the OITRDs are to be deployed by the organisations who rent Ivip
mapped address space to end-user networks.  A summary of the LISP
PTR debate and a pointer to my OITRD business case message are in a
recent message:

  Business incentives for LISP PTRs and Ivip OITRDs
  http://psg.com/lists/rrg/2008/msg02021.html


Col 2 para 5

    Practically, however, Six/One Router will likely be used
    only with transit addresses from IP version 6: The
    one-to-one mapping between edge addresses and the transit
    addresses from a given provider consumes a high number of
    transit addresses, which will prospectively be unavailable
    in IP version 4.

Thanks for clarifying this: Six/One Router is not a contender for
the IPv4 scalable routing solution, unless using IPv6 as its transit
network.


Page 3
------

2.2 Address Rewriting: mapping record

I think it would be helpful if you provided an example mapping
record, showing how the edge prefix is specified, and how the one or
more transit prefixes are specified, with any TE information and
anything concerning the Six/One routers deciding where to send
packets in multihoming failure events.  Maybe Six/One Router doesn't
have such things, which are used in LISP, APT and TRRP.  If so, it
might be good to state this explicitly.

My best guess is that a mapping record looks like this:

     128 bits      Edge prefix address.

       7 bits      Edge prefix length.

       8 bits      Number of transit prefixes.

     128 bits      Transit prefix address 0.
     128 bits      Transit prefix address 1.
     etc.          etc.


All the mapping systems you list are "slow" - in that they do not
attempt to give the end-user real-time control of the behaviour of
the ITRs, or in this case Six/One routers.  This means your system
has to figure out for itself how to cope with multihoming failures.
 (Ivip is different.  The fast push mapping system enables end-user
networks to control the mapping in real-time, so they do their own
failure detection and make their own decisions, completely removing
these things from the map-encap system.)


Page 4
------

Figure 3

The first question which came to my mind with this is whether
Six/One Router could be implemented with a single router, with two
links - one to each provider.  Later in the paper it becomes clear
that you rely on this two router arrangement, together with the
internal routing system, to determine which link packets go out on -
and therefore which transit address they are sent from.

If you do rely on two routers, how is this to work if they are both
in the same room, and effectively in the same part of the local network?

How would you respond to end-users who didn't want to buy a router
and locate it somewhere different in the network for every upstream
link they used for multihoming?


2.3.2 Traffic Redirection

This section concerns Traffic Engineering (TE) and Multihoming
Service Restoration (MSR).

Outgoing TE for load balancing, and outgoing MSR (choice of outgoing
link due to failure of another link) is achieved by the internal
routing system somehow adapting to the TE needs and the link failure
conditions to direct packets to one or the other of the Six/One routers.

Incoming TE and MSR requires affecting the behaviour of Six/One
routers in however many provider networks as are currently sending
packets to this edge network.

I found this sentence (middle of left column) impossible to fully parse:

    A Mapping Preferences message can be returned to any source
    transit address from a packet received from a remote edge
    network.

The "froms" were my sticking points.

Maybe:

    A Mapping Preferences message can be returned to any source
    transit address in response to a special packet received from a
    remote edge network.

I am being pernickety, not least because I think that your paper is
generally written with extreme clarity and with excellent expression
sensibilities.


You later explain the three packets of the Mapping Preference
Message exchange.  I think this initial explanation would be better
if it was expanded a little.


    Mapping Preferences messages list combinations of address
    mappings and preference values. Similar to mapping records,
    the address mappings in Mapping Preferences messages are
    pairs of edge and transit address prefixes. But unlike mapping
    records, address mappings can be specified with variable
    granularity by scaling the length of their prefixes: Since edge
    addresses map one-to-one onto the transit addresses from any
    particular provider, the edge and transit address prefixes in an
    address mapping have the same length. Scaling this length
    facilitates preference feedback at granularities ranging from an
    edge network’s entire edge address space – in which case the
    edge address prefix is a complete routing prefix – to a single
    edge address. Address mappings are allowed to have
    overlapping edge address prefixes. To exclude ambiguities,
    those with longer edge address prefixes take precedence over
    those with shorter edge address prefixes.

Some examples would be really helpful.

Do you really want to have Six/One routers fussing over individual
destination edge IPv6 addresses when they decide which transit
address to send the packet too?

128 bits is a lot of bits to chew through with some CPU- and
DRAM-intensive algorithm on a packet-by-packet basis.

My plan with IPv6 Ivip is to limit the granularity of the mapping
system and the ITR functionality to /64.  That is bad enough, but
your description above indicates that you want all Six/One routers
to be engineered to potentially match all 128 bits of a destination
address of some packet arriving from its local network, to the
longest matching prefix in a potentially lengthy Mapping Preference
Message.

Presumably a Mapping Preference Message takes precedence over
whatever mapping information a Six/One Router may have received.

I think you need some kind of time-out on these Mapping Preference
Messages.  Otherwise, the Six/One router would be required to honour
it forever, no matter what mapping information arrived.   Say some
network sent out a spurious Mapping Preference Message.  How could
the recipient Six/One router later know this was spurious, or that
some corrective Mapping Preference Message was not received?  That
Mapping Preference Message could cause the remote Six/One router be
sending packets to some other networks, so the Six/One router which
sent the now unwanted Mapping Preference Message wouldn't know there
was a problem.


I think this section needs a clear explanation of multihoming
failure detection, decision-making and of how all the sending
networks are told, presumably via Mapping Preference Messages, to
change which transit address they use.

Let's say in Figure 3, the mapping and any currently active Mapping
Preference Messages have the effect of causing all correspondent
networks (all upgraded networks, since non-upgraded networks do not
participate in multihoming) to send all packets to transit address
2000::/48 - via Provider 2.

Here are some fault conditions:

  A - The link to Provider 2 fails.

  B - The router with the link to Provider 2 fails.

  C - Provider 2 itself is cut off from the Net, or is
      has severe congestion.

Where are these conditions detected?

Where is a decision made about which of potentially multiple other
links and transit addresses should be used?

How is that decision turned into the only possible response: sending
 Mapping Preference Message to the Six/One routers in every upgraded
network which is currently sending packets to this network?

In all cases, the messages need to be sent out of a link other than
the one which has failed.

In case A, there needs to be a way the two or more Six/One routers
in the local network can communicate, make a decision and take the
chosen action.

In case B, the surviving one or more Six/One routers need to
recognise the one linking to Provider 2 is dead, and likewise make a
decision and take action.

How would the Six/One routers decide that condition C had occurred?


Assuming a decision was made, how would the routers know which
upgraded networks to send Mapping Preference Messages to?  They
can't reasonably be expected to keep a record of recent traffic.  If
they were expected to, then how long would they need to keep such
records?

What about really busy sites which are receiving packets from tens
of thousands of upgraded networks?  That would require sending
Mapping Preference Messages to every such network.

When you send a Mapping Preference Message from Provider 1, using
1000:/48 you don't address it to a particular upgraded network, but
to the Six/One router which was sending the packets to this edge
network in recent times.

How does the Six/One router know which remote Six/One router to send
the message to?

How does one Six/One router find the address of another?

In the following diagram, there are two upgraded sites, both with
2-way multihoming.  They use four separate providers.  X and Y are
/48 edge prefixes - allocated permanently to the edge network, and
not in the BGP global routing table.  So X and Y are prefixes in the
the new, scalable form of space which is Provider Independent, but
perhaps best not referred to as "PI" space, since this already has a
specific meaning.

A, B, C and D are transit prefixes, all /48, which are PA (part of
some shorter prefix allocated PI to each provider) and which
therefore appear in the BGP global routing table.

There are four Six/One Routers: SOR1, SOR2, SOR3 and SOR4


Edge-1 |    Provider 1   DFZ   Provider 3    | Edge-2
       |                                     |
      SOR1----------A[---------]C-----------SOR3
       |                 \ /                 |
 X[    |                  /                  |     ]Y
       |                 / \                 |
      SOR2----------B[---------]D-----------SOR4
       |                                     |
       |     Provider 2         Provider 4   |


Edge-1 is an upgraded network which in this example is the only one
sending packets to Edge-2.  There could be tens or hundreds of
thousands of such networks sending packets to one or more hosts in
Edge-2 when Edge-2 has a failure.

In this example, Edge-1 is sending all its packets to Edge-2 from
its A transit address, to the C transit address.

Let's say SOR3 fails, or its link to Provider 3 fails, or Provider 3
fails or becomes very congested.

Somehow, SOR4 has to send a Mapping Preference Message to SOR1.

But how does it know that one or more hosts in Edge-1 have been
sending packets to one or more hosts in Edge-2?  The packets didn't
go through SOR4.

Even if SOR3 kept records of traffic - which I think is unworkable -
lets say SOR3 is dead.

I don't see how you can base Multihoming Service Recovery on any
active decisions and messages emanating from the Edge-2.  It might
be OK for some failure modes, but not for all.

Ivip involves the the end-user (whoever operates Edge-2 in this
example) setting up their own multihoming monitoring and decision
making system.  They could do this themselves, and they could base
it inside or outside their network, but the most likely arrangement
is for them to hire the services of some company which does this
sort of thing.  That company has a 100% robust distributed global
network of servers, and it constantly monitors connectivity to
whatever ETRs Edge-2 relies on, and through them connectivity to
whatever internal routers etc. need to be monitored.  This way, the
company can detect any failure, entirely from outside Edge-2's
network, and then change the mapping system for Edge-2's micronet(s)
accordingly.

In the Ivip system, the monitoring company doesn't need to know what
 networks have been sending packets to Edge-2.  The mapping change
affects changes the behaviour of all the world's ITRs which are
handling these packets, in a few seconds.

Since multihoming monitoring and decision making is completely
outside the Ivip system, there can be innovation, any number of
techniques used, all sorts of customised arrangements involving
probe packets, secure arrangements etc. to any depth in the networks
being monitored, from multiple vantage points in the outside world.

With the other map-encap systems, the ITRs are expected to do the
failure detection and decision making, based on previously supplied
options in the mapping data.  They need to do this individually.  It
all needs to be specified as part of the map-encap system, and so
can't be upgraded easily, or customised at all.

Six/One Router is similar to these non-Ivip map-encap systems: you
monolithically build in multihoming failure detection,
decision-making and the actions required to change the path of packets.

The only way any network can change the path of packets is by
sending Mapping Preference Messages to all the particular Six/One
routers which are currently sending packets which need to be redirected.

It is no good sending the Mapping Preference Message to SOR2, since
it is not sending those packets, and since you have no way of SOR2
communicating the contents of such messages to SOR1 or however many
other Six/One routers there are in Edge-1.

Of course, if Edge-2 was getting packets sent to SOR3 from both SOR1
and SOR2, then SOR4 would need to send a separate Mapping Preference
Message to both SOR1 and SOR2.

But again, how can SOR4 know where the packets have been coming from?

It is not good enough to rely on SOR1 getting destination
unreachable messages when the failure occurs.  Maybe SOR3 simply
dies in a way that the link stays open, but it can't communicate
with the rest of Edge-2.

The non-Ivip map-encap systems tend to rely on destination
unreachable messages for their ITRs to figure out something is wrong.

Ivip doesn't rely on such things.  A properly engineered multihoming
monitoring system will securely probe and get explicit responses
from routers, internal nodes or whatever is desired to show that the
links, routers etc. are working as expected.  When these positive
acknowledgements fail to arrive for more than a specified time, the
multihoming monitoring system decides there has been a failure.

Also, the multihoming monitoring system is in a much better position
than the ITRs in the other map-encap systems to keep probing the
network to detect when the failure has been resolved.  Then it can
change the mapping back to what it was.  Arbitrarily complex
detection and decision techniques can be used with Ivip, since it is
nothing to do with Ivip itself.  The other map-encap systems, and
your own Six/One Router, are monolithic and have to specify every
technique for multihoming monitoring, decision making, recovering to
normal operation.  Also, all such functionality needs to be built
into all ITRs and ETRs, or into all Six/One routers and the local
routing systems which largely control their operation.

Ivip doesn't rely on anything being done by the network in question.
 A properly engineered multihoming monitoring system will be
entirely independent of that network, and will securely change that
network's micronet's mapping in a way the operators desire.

I have further questions below about the Mapping Preference Message
system.


Page 4 continued
----------------

2.4 Backwards Compatibility

You make the hosts in the edge network reachable from non-upgraded
networks (AKA "legacy" networks - but I dislike this term) via their
one or more transit addresses.  However, all such traffic is not
subject to any multihoming service restoration system.  That only
works for packets from upgraded networks.

A primary purpose of adopting a map-encap system, or Six/One router
- whatever it is, with its new type of address space which will
solve the routing scaling problem - is to have multihomable,
portable, space, ideally space which can be used for incoming TE as
well.

Yet with Six/One Router	the multihoming and TE functions only work
for packets coming from upgraded networks.  This means there is very
limited motivation for anyone to adopt Six/One Router space
initially - a situation which is likely to persist indefinitely
unless there are other motivating factors sufficient to cause
widespread adoption.

In contrast, LISP with PTRs and Ivip with its OITRDs provides full
multihoming and incoming TE for traffic from non-upgraded networks -
so the impetus to adopt these is high, right from the start, even
before a single other network has adopted it.


Your backwards compatibility system has to cope with various
scenarios.  In one scenario - the correspondent host in the
non-upgraded network initiating the communication, including sending
a single packet - your system relies entirely on the correspondent
host getting the one or more transit addresses for the upgraded
network from the DNS AAAA record, and then finding and using one of
these, after potentially trying one (or more?) edge network
addresses which are not routable in the global BGP system.

Each host in the upgraded edge network has no idea of what its
address would be in the one or more transit prefixes that network is
accessible by.  (To do so would involve an impractical involvement
of hosts with the local routing system, how the network organises
its links to providers, which of those links is currently active and
preferred etc.)

So a host in Edge-2 above only knows its address in prefix Y.

In the scenario in which the communication is initiated by the host
in Edge-2, that host can send packets to some host in Net-13, which
hasn't been upgraded to Six/One Router yet.  It can tell the host in
Net-13 its address in Y, but that will not enable the host in Net-13
to send packets to it.

In this scenario, the only way a host in Net-13 could send packets
to the host in Edge-2 is by using the source address in the packet
it received from the host in Edge-2.  This address would be a
transit address, in C or D.

There are other scenarios:

There is no obvious way some other system (such as a P2P management
system) could tell the correspondent host in Net-13 an address on
which it could send a packet to the host in Edge-2.  Edge-2 could
tell that management system its Y address, and perhaps the
management system could be specially crafted to observe the source
address from which packets from the Edge-2 host arrived.  But this
is a flaky and irregular way for an external system to figure out
what address to tell the correspondent host an IP address to use to
send packets to the Edge-2 host.

Host's don't know - and shouldn't have to know - whether or not they
are in a Six/One Router edge network.  They shouldn't have to know
whether their own address, or the address of other hosts, are "edge"
addresses or "transit" addresses.

Nor is it reasonable to expect any separate management system to
make these distinctions.

Ivip and LISP with PTRs lets the hosts carry on as usual.  All hosts
can send packets to each other on their own addresses, no matter
whether one or both hosts are on Ivip/LISP-managed addresses.



Page 5
------

2.4.1 Destination Address Selection

I found dot point 1 hard to understand at first.  Perhaps it would
be better to write:

    To organise the addressing system so that all edge addresses are
    from a clearly identified prefix, such as 1::/1, 11::/2 or
    111::/3.


I found the rest of the left column pretty hard to understand.

This was very confusing at first, but it made more sense with later
explanation:

    In case of a tie, a candidate destination address is chosen that
    has the longest prefix in common with the source address.

I couldn't imagine what the purpose of this was.

The following sentence could be rewritten to change "choose" into
something more informative.  I thought it referred to some algorithm
in a host choosing something.  In fact it refers to the designers of
the entire system choosing to make a whole section of the IPv6
address space exclusively for edge networks - and not to have edge
network addresses outside this section.

    A means to take maximum advantage of the longest-prefix
    match for destination address selection in Six/One Router
    would be to choose the highest-order bit in addresses so that it
    distinguishes edge from transit addresses.

The rest of this column could probably be rewritten to be less
confusing.  I won't try to detail my difficulties here, but can talk
about them by phone or write more offlist.

The top (continuing) paragraph on the right column is pretty
confusing to me.

You propose some kind of overall prefix to contain all edge address
space, but admit it is not a reliable approach.  I don't fully
understand the previous explanation of how it would work with your
proposed address selection mechanism.

To what extent are you proposing a change to all hosts in the way
they select an address from multiple addresses in an AAAA DNS record?


I foresee major problems with this reliance on DNS to enable hosts
in non-upgraded networks to send packets to hosts in upgraded networks.

Firstly, there are plenty of situations where a host needs to be
told an IP address to use, by some system other than a FQDN and a
DNS lookup.

Even ignoring those instances, lets consider this example which uses
DNS.

Edge-2 is a hosting company.  It has a customer xyz.com, who run
their own nameservers.  The web server www.xyz.com is on a host in
the Edge-2 network.  xyz.com needs to put an AAAA record in their
DNS so hosts all over the world can send packets to their web server.

This is no problem with Ivip, LISP etc.  However I see serious and
probably insurmountable problems with Six/One Router.

You need xyz.com to have not just the Y prefix edge address of the
server for www.xyz.com in their AAAA record, but every address on
which that host would appear on each of Edge-2's transit prefixes: C
and D.

Lets say Edge-2 has 10,000 such customers.  I know it is traditional
for many hosting companies to do the DNS for their customers'
domains, but this doesn't work for all customers, so I will assume
all 10,000 customers run their own DNS, or have someone else run it.

Whenever Edge changes one or more of its providers, it gains or
loses a transit prefix such as C or D.  Each time Edge-2 does this,
it needs to get all its 10,000 customers to change the AAAA record
for their web server in their DNSes!

This is unworkable, and looks to me like a showstopper for Six/One
Router.

I am unlikely to accept arguments about why this is not a realistic
example.  For instance some may folks argue that such a hosting
company is not the sort of edge network which would want, or should
have, Six/One Router managed address space.

We need the new scalable type of address space to be ubiquitously
adopted.  It is not good enough to get 50% of end-user networks
using it, with the other 50% (however defined, such as by the total
number of addresses used, the number of prefixes they use etc.) not
using it, since at most that will only cut the scaling problem by a
factor of 2.

In order to provide the millions (some insist billions) of end-user
networks with portable, multihomable address space, we need the new
type of address space to be highly attractive to *all* end-user
networks.  This means all networks, except those of providers - all
networks of any organisation except of those organisations who sell
connectivity.

So its not good enough to say hosting companies won't be using the
new kind of address space.

Nor is it good enough to say that the very large hosting companies,
in which the above problem is most acute, wouldn't need to use the
new kind of space. (Arguably, there would be few enough of these
that we could cope with them using conventionally BGP managed space
in perpetuity.)

We need all hosting companies, large and small, to want to use the
new kind of space.  If there is a perception that the new kind of
space is not suitable for the largest hosting companies, then every
start-up company will insist on using conventional space, because
they are sure they are going to grow into a large hosting company
real soon.

It is bad enough Edge-2 having to change all its DNS records every
time it gets a new transit prefix, but I think it is unworkable for
Six/One Router to require such DNS changes in organisations which
are separate from Edge-2, but in some way involved with the hosts in
Edge-2's network.


At the end of 2.4.1, I have a rough idea of what you are suggesting
about using existing NAT traversal approaches to cope with the
problems inherent in the current design for Six/One Router.  I think
a fuller explanation, with examples, would be helpful.


Pages 5 and 6
-------------

2.4.2 Source Address Consistency

I found this sentence confusing:

    The mode in which a packet is exchanged can consequently be
    determined based on the type of remote address as long as the
    packet is within the boundary of an edge network: The packet is
    exchanged in Bilateral mode if the remote address is an edge
    address.

"within the boundary of an edge network" sounded like the physical
location of a packet at some point in its travels.  In fact, I think
it means whether its source address can be determined as being
within an edge network.

I had to re-read this whole section carefully.  I don't think I
fully understand it, but I surmise:

  1 - This section only concerns communications between hosts which
      are both in upgraded networks.  So the sentence:

          With this invariant, every packet exchange between two
          hosts must be in either of two modes:

          *  Bilateral mode — both hosts in the packet exchange use
             edge addresses to reach each other.

          *  Unilateral mode — both hosts in the packet exchange use
             a transit address to reach each other.

      needs to be understood as not referring to the Unilateral
      arrangement used for communicating with a host in an
      non-upgraded network (Figure 4), but only to the situation of
      both hosts being in upgraded networks: Figure 2 (Bilateral)
      and Figure 5 (Unilateral).

      BTW, I think these terms are confusing, since Figure 3 is
      perfectly symmetrical in terms of both sides doing the same
      thing.  The Uni/Bi-lateral concept refers to whether incoming
      packets from the provider are rewritten or not. Outgoing
      packets are always rewritten.


  2 - Each Six/One router can't figure whether or not the source
      address of a packet arriving on the provider link is an
      edge address or not.  (But on the page before, was a plan
      to arrange the addressing system so edge addresses always
      came from a defined short prefix, and so could be identified
      as such by some simple algorithm.  I don't understand how
      these two concepts relate to each other.)

  3 - In order to solve this, there are two approaches.

      a - The preferred approach is to use a bit in the IPv6 header
          to indicate this.  A set state for this
          "Bilateral/Unilateral" bit means that this is a
          "Bilateral" exchange, so the source address of the
          incoming packet should be rewritten (always to an edge
          address?) before the packet is sent to the local
          destination host.

      b - (Note 5) An alternative to the new header bit is to use
          an option header, but this raises all sorts of problems
          with DFZ routers not handling such packets efficiently,
          and with the packet becoming longer, and different.
          This would be inefficient, raise PMTUD problems, and
          generally nullify Six/One Router's greatest attraction
          over map-encap schemes: that the packets do not get any
          longer.

I think it would be good to have more discussion of this new bit
being used in the IPv6 header.  Without it, Six/One Router is not
going to work at all - since the option header alternative means it
probably can't work with current DFZ routers, and would lose most of
its attractiveness over map-encap.

The most likely place for such a bit is the Flow Label, which
according to the Wikipedia and:

  http://www.tcpipguide.com/free/t_IPv6DatagramMainHeaderFormat.htm

is not used at present.


Pages 6 and 7
-------------

2.4.3 Multi-homing Support

I had to re-read this section too.

The Unilateral mode referred to here is the Unilateral mode between
hosts in two upgraded networks: Figure 5 - NOT the Unilateral mode
between a host in an upgraded network and a host in a non-upgraded
network (Fig 4.)

I got completely lost here:

   Six/One Router achieves this by making the
   providers of a multi-homed edge network responsible for
   connectivity to disjoint and complementary subsets of the
   transit address space, while having all of them provide
   connectivity to the complete remote edge address space.
   Providers back up each other’s routes to remote transit
   addresses.

and the paragraphs which follow.

Even without understanding the foregoing, I have some understanding of:

   Finally, to enable fast re-establishment of packet exchanges in
   Unilateral mode after a provider failure, Six/One Router must
   meet the following two requirements:

   *  Backup routes for the defunct ones must be available
      quickly in the edge-network-internal routing system.

   *  Backup Six/One routers must be aware of the fail-over so
      that they can start accepting incoming packets that used to
      be bound to the failed provider.

   To achieve the first, providers offer backup service for the
   routes to remote transit addresses that other providers are
   responsible for.

While I don't understand this, does it mean something do to with
provider 1 advertising in the DFZ a prefix which was until recently
advertised by provider 2?  That doesn't sound desirable or
practical, but it is the most sense I could make out of the section
to this point.

Or does it just mean that if the provider 2 link (Figure 6) fails,
that provider 1 will accept (and forward to the DFZ) packets from
the Six/One router on the left, which have a source address in the
2000::/48 prefix?


The second last paragraph of this section does explain something
about multihoming monitoring decisions and how those decisions lead
to the desired flow of packets from another Six/One router, link and
 transit address.  However, this is just for the scenario in which
the link to provider 2 is dead, and the Six/One router on the right
recognises it.

What if that router is dead?  Or what if provider 2 is disconnected
from the Net, or is so congested as to make this link incapable of
handling the traffic?


Page 7
------

2.4.4 Avoiding Adverse Effects of Unilateral Mode
on Transport Protocols and Applications

Again, I think this concerns Unilateral mode for hosts both in
upgraded networks, not Unilateral mode with one host in a
non-upgraded network.

    Applications are affected if they reference addresses in packet
    payloads because unilateral address rewriting in the IP header
    of a packet then leads to address inconsistencies between the IP
    header and the packet payload. Six/One Router relies on
    application functionality for network address translator
    traversal [Ro2003, Ro2007] to avoid such address
    inconsistencies.

These references are to:

   STUN  http://tools.ietf.org/html/rfc3489
   ICE   http://tools.ietf.org/html/draft-ietf-mmusic-ice-19
         http://www.rfc-editor.org/queue.html#draft-ietf-mmusic-ice
         (Last updated 2007-10-29)


It would be good to discuss how STUN and ICE, which are meant to
work with conventional NAT, would work with Six/One Router.

The following text seems to indicate a showstopper:

   Applications that reference addresses in packet
   payloads depend on this functionality already today, due to the
   existing deployment of network address translators. It is hence
   safe to assume that those applications, which use addresses in
   packet payloads, also support network address translator
   traversal.

This seems to assume the hosts in the upgraded networks are clients.

However, the new address space for the scalable routing solution
absolutely needs to support servers.  I don't see how servers can
work if there is any requirement for hosts in the new space to use
STUN or ICE in any way.



Page 7
------

Para 3 and beyond in right column: Header checksums

The system absolutely has to work with existing cryptographic
techniques.

   An alternative to re-computing Internet checksums in Six/One
   Router is to choose the mapping between edge and transit
   addresses such that the checksum does not change during
   address rewriting. This technique is applicable to all packets,
   even if their payloads are integrity-protected or encrypted. It
   is also efficient because the checksum does not have to be
   localized within a packet. Mapping edge and transit addresses
   such that the checksum does not change during address
   rewriting is practical where the routing prefixes of IPv6 edge
   and transit addresses are at most 48 bits long: The remaining 16
   or more bits in the standard 64-bit subnet prefix of an IPv6
   address can then be used to compensate for the checksum
   difference that rewriting of the routing prefixes alone would
   create.

There is a potentially serious problem here:

  Since you must support crypto on all packets, and since this
  technique (if it works) is the only one available, it restricts
  the granularity of the use of Six/One Router to prefixes of 48 or
  less.

  That doesn't seem to be a big problem to me, but in other parts
  of the paper you discuss granularity down to single IP addresses
  (/128) and using finer granularity prefixes (presumably longer
  than /48) in Mapping Preference Messages, I think to spread
  load over multiple incoming links.

Do all these crypto arrangements involving headers simply treat the
source and destination addresses as checksums modulo 16 bits?  I
haven't checked this, but it doesn't sound very secure.  It would be
good for you to list the crypto arrangements you have investigated,
and point out why you are sure they would be unaffected by the
technique you propose:

    More specifically, the difference (delta) between the
    checksum of an edge address routing prefix and the checksum
    of a corresponding transit address routing prefix is the value
    by which the lower 16 bits in the subnet prefix must be adjusted
    during address rewriting to avoid changing the checksum of the
    packet. 16 bits are sufficient for this because the checksum,
    too, is 16 bits long. And since the routing prefixes are static,
    so is (delta).

So does this mean something like the following:?

   Host A has the edge address 4000::1.

   It is in a network using a transit prefix 6000::/48

   So when a packet sent from this host has its source address
   rewritten, it would (without the above arrangements to keep the
   crypto protocols happy) be rewritten to 6000::1.

   Since this bumps up the (assumed) 16 bit checksum by 2000 Hex,
   the above workaround actually rewrites the address with a new
   value in the bits 65 to 71 positions, to subtract 2000 from
   the checksum.

   (I am using ordinary binary order here.)

   So the addresses are:

                           1
                           2            7 7  6  6
                           7            2 1  5  4                 0

   Edge address            4000 0000 0000 0000  0000 0000 0000 0001

   Ordinarily rewritten
   address                 6000 0000 0000 0000  0000 0000 0000 0001

   Rewrite with crypto-
   friendly workaround:    6000 0000 0000 E000  0000 0000 0000 0001


This clearly needs to be implemented for all Six/One Router rewrites.

I am not sure how it could work with edge and therefore transit
prefixes longer than /48.

It messes up the conceptually clean idea of simply translating a
linear range of addresses into some other linear range.

This business of always rewriting bits 71 to 65 of the destination
and source addresses in order to adjust the header checksum to keep
crypto protocols happy . . . this algorithm needs to be applied to
all the transit addresses provided in AAAA records.

Unless you can show that all relevant crypto protocols would be
happy with this workaround, I think the header checksum problem is a
showstopper.



Page 8
------

Delays inherent in relying on mapping information

In the first para in 2.5.1:

   Six/One Router relies on the trustworthiness of the mapping
   system to ensure that remote edge and transit addresses are
   rewritten correctly. Six/One routers can rewrite the destination
   edge address of a packet that leaves their edge network only
   after retrieving the corresponding mapping record from the
   mapping system. And they can rewrite the source transit
   address in a packet that enters their edge network only after
   retrieving the corresponding mapping record from the mapping
   system.


Assuming the two hosts are in two upgraded networks, the Six/One
routers are using Bilateral rewriting, and these routers have no
cached mapping information for the relevant prefixes, then this
means that before a packet sent by host A will reach host B, the
following has to occur in sequence:

  1 - A's Six/One router needs to request the mapping information
      for the edge prefix in which the destination address (B's
      edge address).  (Request type F below.)

  2 - That request needs to be forwarded to a query server of
      some kind which can respond authoritatively and in a manner
      which the router can authenticate as being secure.

  3 - The response needs to be forwarded back to A's Six/One router.

  4 - That router rewrites the address and forwards the packet
      towards the DFZ.

  5 - When it arrives at the border of B's network, B's Six/One
      router somehow determines how to request the mapping it needs
      to rewrite this packet's destination address to be the desired
      outcome: B's edge address - and to rewrite the source address
      to the desired outcome, A's edge address.

      The first part is easy.  B's Six/One router knows the transit
      prefix the packet was received within and the edge prefix, so
      it can reverse the rewrite done by A's Six/One router,
      including reversing the alteration of bits 71 to 65 (crypto
      header checksum workaround).

      However, to rewrite the source address correctly, it needs
      mapping information.

      How does B's Six/One router figure out the edge prefix of
      A's network?  All it has is some source address, which
      was rewritten to be in one of the potentially numerous
      transit prefixes used by A's network.

      So I think the mapping query server has to handle two types of
      query:

        F - Given an edge address, return the length and base
            address of the the edge prefix of that network, as
            well as the start addresses of the one or more transit
            prefixes used by that network.

        R - Given a transit address, return the starting address
            of that transit prefix, and the edge prefix (starting
            address and length) for the network which is using
            this transit prefix.  (Maybe also return this network's
            other transit prefixes?)

      For the moment, lets not even think about how the system
      handles multihoming outages, Mapping Preference Messages
      etc.

      Is this correct?  I don't recall two types of mapping request
      being mentioned in your paper.

      So B's Six/One router generates a type R mapping request.

  6 - The mapping request reaches the appropriate query server, and
      it sends the response.

  7 - The response arrives at B's Six/One router.  This enables it
      to know the edge prefix and transit prefix used by the A's
      Six/One router.  Therefore it can calculate what was added
      to A's edge source address to create the transit source
      address in the just-arrived packet.  This enables it to
      reverse that rewrite.  The rewritten packet is now
      forwarded to host B.

This is two sets of query response:

 * Query
 * Response
   Send translated packet
   Translated packet arrives
 * Query
 * Response
   Translated packet delivered

If you are using these mapping systems (top right of page 3):

  CONS
  ALT
  DNS Map (I think)

then these all involve a global query server system.  This raises
problems with long delays and unreliability in getting a mapping
response.

With Six/One Router, the situation is worse than for the map-encap
systems, since there are two cycles of query and response.

I think these mapping systems would mean that address space relying
on Six/One Router would involve unacceptable delays in delivering
initial packets as discussed here:

   http://www.firstpr.com.au/ip/ivip/lisp-links/#long_paths


Host B now wants to send a packet back to host A, and this packet
happens to go through the same Six/One router just mentioned.  That
Six/One router retains no state based on the previous packet
rewriting, but it does have cached mapping information of A's edge
prefix.

How does this Six/One router in B's network know which of
potentially multiple transit prefixes used by A's network should be
used for the rewriting of the destination address of the current
outgoing packet?

Assuming the type B response included all of A's transit prefixes,
then this gives the Six/One router a list of such prefixes to choose
between.  Maybe, like with the multiple transit addresses in the
AAAA record, the Six/One router simply chooses one.  Or does the
mapping information include weighs for each transit prefix to
implement incoming load balancing for the remote network?  If so,
then this doesn't apply to packets coming in response to finding a
transit address in an AAAA record.


Once the Six/One router has chosen a destination transit prefix to
translate the packet's destination address to, it can do the
rewrites and send the packet on its way.

What of the next packet destined for the same edge prefix which
arrives at this Six/One router?  Does it go through the same
procedure, potentially choosing another destination prefix in the
remote network?

There is no state in the router concerning packets handled, so I
guess there is no continuity in Six/One router behaviour from one
packet to the next.


I haven't yet looked at how Six/One Router handles PMTUD and packet
too big messages in the translated portion of the path to the
destination host.




--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg
Follow-Ups:
- [RRG] Re: Six/One Router revised 2008-07-12
  - From: Christian Vogt <christian.vogt@nomadiclab.com>
Prev by Date: Re: [RRG] Opportunistic Topological Aggregation in the RIB->FIB Calculation?
Next by Date: [RRG] Re: Six/One Router revised 2008-07-12
Previous by thread: [RRG] Agenda changes...
Next by thread: [RRG] Re: Six/One Router revised 2008-07-12
Index(es):
- Date
- Thread