[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RRG] Short explanation: Ivip6's "Core Routing Label Forwarding"



Hi Iljitsch,

Please abandon the thread with the old "FLOWv6" name.

In "Re: [RRG] FLOWv6: IPv6 Flow Label to control DFZ forwarding" you
wrote:

>> My proposal is not related to MPLS.  I will reply to your
>> critique soon in the "Ivip6 (previously FLOWv6), Ivip4 and Ivip"
>> thread.
>
> Well, MPLS is also based on encoding routing decisions in a 20-bit
> field to avoid a full lookup...

This is a coincidence.

> I'm not talking about the details, obviously.

I think you should be if you chide people on the RRG list about what
they write.

We are attempting to do something really challenging and important
here - devise a new architecture for the Internet, in three ways at
once: for IPv4, for IPv6 and potentially for a whole new clean slate
approach.

If we ignore important details or expect everything significant to
fit in a few pages or on the back of a drink coaster, then we won't
produce the excellent results which are needed.


> The way I see it, the main issue is the mapping system. If we can
> figure that part out, I'm not too worried about the encapsulation
> stuff because a good modular mapping system should of course work
> with a variety of encapsulation systems.

But this problem is complex enough that the optimal design will
surely involve certain characteristics of the mapping system fitting
particularly well with however the ITRs send packets to the ETRs.

For instance, Ivip's simple ITR functionality - requiring end-users
to supply the multihoming service restoration stuff - means the
mapping data is simpler, since there is only one ETR address to specify.

The map-encap systems have important distinctions between them - and
encapsulation sucks for a number of reasons, not least of which is
the PMTUD problems which require a complex solution:

  http://www.firstpr.com.au/ip/ivip/pmtud-frag/

Also, the encapsulation overhead sucks even worse for IPv6 than for
IPv4.  For LISP it is a 40 byte IPv6 header, a 12 byte UDP header
and an 8 byte LISP header.

If you think it doesn't really matter about IPv4 vs. IPv6 and that
the solution should work just the same for both of them, then I
don't think you have looked at my Ivip6 proposal properly, or really
considered the trouble ITRs and ETRs have to go to in order to
handle PMTUD with map-encap.

Ivip6's approach has no PMTUD problems, no encapsulation overhead,
no disruption of Traceroute - and no problem with IPsec etc.

> So we could use LISP as the default

Sure, but IMHO LISP has all sorts of failings compared to Ivip, as I
note in the comparison page and the partial list of critiques of LISP:

  http://www.firstpr.com.au/ip/ivip/comp/
  http://www.firstpr.com.au/ip/ivip/lisp-links/#critiques

> and then define additional ways to do this with MPLS and maybe
> even with light path stuff.

MPLS requires extensive set-up.  There's no time or resources for
that.  The ITR gets a packet it needs to send to an ETR it has never
heard of.  There is no time for sending a packet to the ETR, for
testing the PMTU to the ETR etc.

This is a very tight situation and MPLS or anything elaborate won't
scale at all.


Like a few other folks on this list you seem at times to be happy to
talk at length about high-level architectural concepts while
avoiding important low-level details.  Yet you were the only person
who took a real interest in Fred Templin's or my work on solving the
PMTUD problems inherent in map-encap.

Some folks here seem happy to discuss proposals such as ILMP -
whilst taking little apparent interest in proposals which have a
real chance of being implemented.

All the map-encap proposals, in principle, could be adopted -
because they do not require host changes.

ILMP, GSE etc. may be mathematically or in architecturally
attractive - but they will *never* be adopted, because they require
everyone change their applications.

That will never happen.

The Internet is not that broken that everyone that the fix must
involve everyone changing to a new set of applications, or have all
the current ones rewritten.

There are fixes for the routing scaling problem, for IPv4 and for
IPv6.  I am confident Ivip4 and Ivip6 will do the trick.  Perhaps
better solutions will be developed.  Other people think their
map-encap systems are better than Ivip4 - and they think it is
easier and better to upgrade a subset of routers than to change
hosts, applications etc.

(Ivip6 does require modest changes to all core routers.  I think
that is easier than coping with the packet overheads and PMTUD
problems of map-encap for IPv6.)

> (Electronics is sooo last century...)

I have a vision of you elevating yourself further and further away
from this Earth!  Kicking away from Mortal Detail . . . Seeking
Enlightenment and the Ultimate Scalable Routing Solution in a
mystical Vision which can only come from On High!

From far enough up in the sky, one map-encap system probably looks
pretty much like any other map-encap system.

There is good cause to forget about low-level details for a while to
break old patterns of thought - and maybe find a new solution.

However, much of the RRG discussion recently has been directed away
from actual solutions which have a chance of being widely adopted,
in favour of half-developed proposals which definitely will not be
adopted in any realistic version of the foreseeable future.

A complex problem like this required detailed thinking, including
especially about electronics:

  TCAM sucks - way to low density and high power consumption.

  48 bits of IPv6 addresses being significant to each core
  router really sucks, because it takes far too much CPU power
  and DRAM accesses to chew through each packet's destination
  address - just to figure out which interface to forward it on.

  Part of the cost of our Internet access is ISPs spending more
  and more on purchasing, maintaining powering and cooling routers.
  For instance the FIB board on a CRS-1 (CRS-MSC) costs USD$80,000
  and dissipates 375 watts.

    http://www.firstpr.com.au/ip/sram-ip-forwarding/router-fib/

  The original sales material for this device didn't guarantee
  it could handle IPv6 packets at full rate.  I don't know if
  it can do that now, but I think it would have trouble chewing
  through 40Gbps of short incoming IPv6 packets addressed to a
  variety of the /48 prefixes now being assigned for PI end-user
  networks.

  The problems of electronics are the primary cause of the routing
  scaling problem.  If electronics wasn't a problem, we would
  just throw more of it at BGP routers and crank the clock speed
  to 50GHz.


We need to think of some gutsy, practical solution which is as
elegant as we can make it, and which will really really work - AND
which will be attractive to most people who run end-user networks.

That won't happen by taking a birds-eye view of the whole scene and
avoiding messy details such as PMTUD, packet bloating encapsulation
overheads, or overly complex ITRs and ETRs trying to do their own
multihoming service testing and restoration decisions.

I am surprised that all the other schemes assume you can't push
mapping data quickly around the world.  It is not that much data
that we should be so scared of doing this.  To assume this is
impossible is to doom the solution to monolithically integrating
multihoming fault detection and service restoration decision-making
into the scalable routing solution itself.  That should never
happen.  So far, only Ivip proposes to make that separate, so the
routing scaling solution becomes a modular part of multihoming and
TE, not a ponderous, limited, boxed up, slow solution which can't be
extended without upgrading every ITR and ETR.


> The part that worries me about reusing the flow label is that the
> flow label is defined to be non-mutable-in-transit.

Sure, but no-one uses it.

RFCs are not written in stone - they are composed entirely of
electrons which have already been recycled many times.

RFC 3697 can be withdrawn when we define a much better use for these
bits, as a Routing Label to be used under carefully defined
conditions.

It is no fault of RFC 3697 that the Flow Label is not used.  RFC
3697 does a great job of defining the semantics of the Flow Label.


> So it would be ok for the source host to use the flow label in
> this way, but not for routers to change it.

No-one seems to have bothered to actually use the IPv6 flow label in
the 14 or so years it has been available for experimentation and
practical use.

The other morning, these 20 bits intimated to me clearly that they
are bored witless being set to 0000 0000 0000 0000 0000 year after
year.

They are keen to be set to Meaningful Values and so to do something
Useful!


> Also, we would still need some bits to allow for different
> flows towards the same destinations to be treated differently as
> per the original flow label semantics.

The Flow Label is only of value if at three conditions are met:

  1 - The sending host sets it according to some pretty complex
      rules which are spelt out in RFC 3697.  These involve
      coordination with the OS to make sure the same number is
      not used by different apps, and that no number is reused
      for another flow within 120 seconds.

         Do you know if popular operating systems have an API
         for requesting a value for the Flow Label, or for
         setting it when sending packets?

  2 - The deployment of routers - let's assume BGP routers in the
      core of the Internet - which can use flows intelligently
      to spread the traffic over multiple paths, without separating
      flows.  This is a huge project, and so far hardly anyone seems
      to be motivated to research it seriously - yet IPv6 is
      crawling with researchers.  (It seems to be used primarily
      for research.)

  3 - The Flow Label is only used to distinguish between two or more
      flows from one single source IP address to another single IP
      address.  If either are different, the Flow Label is ignored.


A reasonably easy approach to ensuring that any routers doing
flow-aware multipath sharing of resources (point 2, if it ever
happens) can do their work correctly, without relying on the Flow
Label is this:

  Either the source or the destination host (or both) can use two
  or more separate IP addresses for their end of the communication,
  for each flow.

This is only necessary when two or more flows really are being sent
to another host.

That might require some OS coordination of applications, and of
course it would require each host have multiple IP addresses.  This
wouldn't be practical for IPv4, but there is plenty of space in
every IPv6 /64, to each host can think of 128 random or numbers - or
128 sequential numbers from a random starting point - and try to
make them IP addresses.  It will usually succeed.

So the workaround for for flow separation between two hosts without
the use of a Flow Label is for one or the other or both hosts to use
a different IP address.


> BTW, why is it Ivip and not ivip or IVIP?

Because the Internet deserves its capital I, to distinguish Ivip
from iPod, iPhone etc. etc. and to encourage pronunciation of
Eye-vip, as in "ivy", not as in "skivvy". A full explanation is:

  http://tools.ietf.org/html/draft-whittle-ivip-arch-01#appendix-B

and will disappear from future revisions.

  - Robin


--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg