[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: shim6 @ NANOG (forwarded note from John Payne)

To: Iljitsch van Beijnum <iljitsch@muada.com>
Subject: Re: shim6 @ NANOG (forwarded note from John Payne)
From: "Jason Schiller (schiller@uu.net)" <jason.schiller@mci.com>
Date: Mon, 27 Feb 2006 02:02:33 -0500 (EST)
Cc: shim6-wg <shim6@psg.com>
In-reply-to: <Pine.LNX.4.62.0602251008220.24745@uplift.swm.pp.se>
Iljitsch,

As far as TE relates to IPv6, I have concerns that the TE requirements are
not fully defined.  Based on past discussions in the working group, and
after reading some IPv6 publications that discuss how BGP based TE is
used, it appears people think the only valid TE concerns are 1. How to
load up all links, and 2. How to make failover work. In reality there are
more requirements!

On Fri, 24 Feb 2006, Iljitsch van Beijnum wrote:

> If you have several links to the internet, you don't want one to sit
> idle while another is congested. So traffic engineering is an
> essential part of multihoming.

> What I said was that I didn't understand why people want to have two  
> links and then have the second one sit idle until the first fails. I  
> know people want this because I used to configure this for customers  
> when I worked at UUNET NL. But my thinking is that if you have  
> multiple links, you'll want to use all of them.

The fact of the matter is that there are various other desired
configurations, including various combinations of primary / backup, best,
load sharing across links, and the ability to dial traffic up or down in
all scenarios.  

We can debate about the various merits of why someone would want a backup
link that is idle.  The backup link may cost less, for example we offer a
cheaper priced shadow link in addition to a primary link, or the primary
link may be with one ISP at fixed price, and the backup may be a metered
service with another IPS.  The backup link may have lower quality such as
an over subscribed network, some packet loss, or may be a high latency
satellite link.  The primary link may be an OC-3 and the backup link may
be three DS-3s.  

All in all it doesn't matter why people want a backup link or how it is
useful.  The only thing that matters is that people desire this
functionality.

In addition, the IPv6 multihoming solution should not remove the current
tools transit ASes currently have in IPv4 style multihoming.  This doesn't
mean transit ASes simply over-ride the down stream TE preferences.  A
transit AS attempts to move the packet closer to the destination.  If the
destination is more than one AS away, then it may be reached through
multiple neighbor ASes.  In this case the transit AS may have some TE
choices. 

On Fri, 24 Feb 2006, Iljitsch van Beijnum wrote:

> Now obviously most ISPs are multihomed themselves so they want to do
> their own TE. That's completely legitimate. However, that doesn't mean
> that they get to decide how their customers can do TE.

It is not only important for transit ASes to be able to traffic engineer
their traffic to and from other ASes as I have described previously.  This
can also be used to accommodate both the source outbound TE and the
destination inbound TE when they conflict.  

For example say cust1 and cust2 are both multihomed to Sprint and
UUNET.  Say the cust1 wants to use UUNET as the primary inbound and
outbound link because it is higher performance.  Say cust2 wants to use
the Sprint link as the primary inbound and outbound link due to cost
savings.  In this case cust1 can have a default route to UUNET and a high
cost default route to Sprint.  Cust1 can advertise their network to UUNET
with the default local preference.  Cust1 can advertise their network to
Sprint with a community to lower the local preference.  Cust2 will do the
opposite.  Cust2 will have a default route to Sprint and a high cost
default route to UUNET.  Cust2 will advertise their network to Sprint with
the default local preference.  Cust2 will advertise their network to UUNET
with a community to lower the local preference.

In this case traffic from cust1 will get forwarded to UUNET due to the
more preferred default route.  UUNET will honor the community from cust2
to lower the local preference on the route it learns directly from
cust2.  Traffic will be forwarded to Sprint who will pass it along to
Cust2.  

The opposite will happen in the reverse direction.  Cust2 will forward the
traffic to Sprint.  Sprint will honor the community from cust1 to lower
the local preference on the route it learns directly from cust1.  Traffic
will be forwarded to UUNET who will pass it along to Cust1. 

Both cust1 and cust2 inbound and outbound TE policies can be honored even
if they conflict with each other. 

On Fri, 24 Feb 2006, Iljitsch van Beijnum wrote:

>
> On 24-feb-2006, at 19:47, Jason Schiller (schiller@uu.net) wrote:
>
> > I am baffled by the fact that Service Provider Operators have come  
> > out in
> > this forum, at the IAB IPv6 multihoming BOF, and other places, and  
> > have
> > explained how they and their customers use traffic engineering, yet up
> > until now, shim6 has not tried to provide thier needed functionality.
>
> I think what we have here is a disconnect between what's going on in  
> the wg (and the multi6 design teams) and what's visible from the  
> outside.
>
> I remember MANY conversations, in email and during meetings, about  
> traffic engineering. And for me, there has never been any question  
> that traffic engineering is a must-have for any multihoming solution.  
> Paying for two (or more) links and only being able to use one 99% of  
> the time is simply too cost-ineffective. And just maybe we can  
> convince people that shim6 makes for good multihoming even though it  
> doesn't give you portable address space, but it's never going to fly  
> if the TE is unequivocally worse than what we have today. (And I've  
> said this in the past.)
>
> However, for a number of reasons this isn't all that apparent to an  
> outside observer:
>
> - part these conversations were on closed design team lists, private  
> email or in (design team/interim) meetings (for instance, only 3% of  
> the messages in multi6 for the last couple of years mention TE)
> - I don't think any of us, but at least not me, saw TE as a  
> particularly hard-to-solve problem
> - TE can only happen if the base mechanisms are well understood, so  
> were focussing on those first

Maybe there is a disconnect.  Maybe my concerns have been addressed by
private off-list communications.  If that is the case, can someone address
the concern that the current "IPv4 style" TE functionality is needed for
IPv6 multihoming?  Can someone define what they think are the TE
requirements (as far as I can tell they are lacking)? Can someone explain
how each of the current "IPv4 style" functionalities will be supported in
shim6?  

Maybe someone can come to the next NANOG and set things straight 
on a pannel?

> > -First hit is critical
> >   * establishing shim6 after the session starts doesn't help
> >     short lived sessions
>
> I'm not sure where this comes from. Since shim6 doesn't come into  
> play until there is a failure, and failures are too rare to be  
> meaningful in TE, the shim6 failover protocol itself is fairly  
> meaningless for TE. What we need is mechanisms to do source/ 
> destination address selection in a way that can be traffic  
> engineered. Length of individual sessions is meaningless as shim6  
> doesn't work per-session. Most short sessions are part of a longer  
> lived interaction (i.e., a user visiting a WWW server and retrieving  
> dozens or hundreds of resources over the course of a dozen seconds to  
> many minutes).

It is my understanding that DNS may be an incomplete list of locators, or
may have multiple sets of locators for multiple round robin hosts.  Shim6
will exchange the complete locator set for a single host.  I assume if you
want TE to work, then in addition to exchanging locators, you will need to
exchange inbound TE preferences.  This also means you may want to
immediately switch over to use the shim so that you TE works prior to a
failure.

> >   * Keeping shim6 state on the end host doesn't scale for content
> >     providers.  A single server may have 30,000 concurrent TCP  
> > sessions
>
> Right. So there is precedent for storing state for 30000 instances of  
> "something". Servers are getting a lot faster and memory is getting  
> cheaper so adding a modest amount of extra state for longer lived  
> associations shouldn't be problematic.

The impression I get from content providers is that it is non-trivial to
support the added state from shim6.  Yes, computers are getting faster,
and memory cheaper, but these systems may negatively impact their business
models as power consumption, cooling requirements, and rack space
consumption increase.  But I won.t try to speak authoritative on this
issue, and will instead defer to some of the larger content providers.

> Let me speak for myself and speculate a bit: what we should do is  
> have multihomed sites publish SRV (or versy similar) records with two  
> values: a "strong" value that allows primary/backup mechanisms, and a  
> "weak" value that allows things like 60% of all sessions should go to  
> this address and 40% to that one.
>
> Then, before a host sets up a session it consults a local policy  
> server that adds local preferences to the remote ones and also  
> supplies the appropriate source address that goes with each  
> destination address. New mechanisms to distribute this information  
> have been proposed in the past, but there is already a service that  
> is consulted before the start of most sessions, so it makes sense to  
> reuse that service. (No prizes for guessing what service I'm getting  
> at.)

I think using SRV records for TE is overloading the DNS function.  Again,
I.m not a DNS server operator, and cannot speak authoritatively on this,
and will defer to them.  

However I do have some concerns about TE preferences in DNS.  First, DNS
records tend to be cached for scalability, and to reduce traffic.  TE
policy may need to change quickly to reflect a topology change.  This
seems problematic.

Secondly, I am concerned that DNS may only allow for the end point to
indicate their inbound preferences, but what about the cumulative TE
preferences of all of the ASes in each of the different paths between the
source and destination,  how is this reflected in DNS based TE?

>
> This would allow for pretty fine tuned incoming TE, as long as the  
> other end doesn't have a reason to override the receiving site's  
>preferences.

Current "IPv4 style" TE can allow for conflicting source outbound policy
and destination inbound policy as long as an AS lies between them.

> > |Yuck, you should never announce more specifics for this.
>
> > Please beleive the DFZ Service Provider's when the explain how  
> > they, and
> > their customers do TE.
>
> I believe that they do it, because I see that the global routing  
> table has increased by 16% last year. I have to admit that I've done  
> this myself from time to time, but only if AS path prepending (or  
> changing the origin attribute) wouldn't result in something  
> reasonable. It seems to me that for many people deaggregating is the  
> default these days. And then not just breaking a /20 into two /21s,  
> but go for broke and announce 16 /24s, who cares?

All the ASes that are advertising 55,000 more specifics to the global
Internet routing table will care when you take this TE tool away from them
and the traffic loading on their links change.

> > Transit AS TE is more critical in the case of moderate sized transit AS
> > that is purchasing transit from multiple upstreams.  Especally when links
> > are cost prohibative.  Take a large South American ISP that has 16 STM-1s,
> > where 4xSTM1 use the Americas 2 oceananic cable system to up stream
> > transit provider1, 4xSTM1 use the Emergia oceananic cable system to up
> > stream transit provider1, 4xSTM1 use the Americas 2 oceananic cable system
> > to up stream stream transit provider2, and 4xSTM1 use the Americas 2
> > oceananic cable system to up stream stream transit provider2.  Now imagine
> > that your most important customer who always complains about latency
> > should always use the Americas 2 oceananic cable system to up stream
> > tranist provider1.  Also imagine all other traffic should load all the
> > other links as equally as possible.  and given that any one or more links
> > fail, all the links should be loaded as equally as possible.  Note: This
> > is just one example of a real world customer.
> Unfortunately this is incompatible with hop-by-hop forwarding for  
> outgoing traffic from the customer. Obviously this can be solved both  
> today and with shim6 using MPLS or similar.

Inter-domain MPLS with customers or Peers is not a tool most large
networks are comfortable with.  This needs to be solved inside of shim6.

___Jason


==========================================================================
Jason Schiller                                               (703)886.6648
Senior Internet Network Engineer                         fax:(703)886.0512
Public IP Global Network Engineering                       schiller@uu.net
UUNET / Verizon                         jason.schiller@verizonbusiness.com

The good news about having an email address that is twice as long is that
it increases traffic on the Internet.

On Sat, 25 Feb 2006, Mikael Abrahamsson wrote:

> Date: Sat, 25 Feb 2006 10:11:49 +0100 (CET)
> From: Mikael Abrahamsson <swmike@swm.pp.se>
> To: shim6-wg <shim6@psg.com>
> Subject: Re: shim6 @ NANOG   (forwarded note from John Payne)
> 
> On Fri, 24 Feb 2006, Iljitsch van Beijnum wrote:
> 
> > What I said was that I didn't understand why people want to have two links 
> > and then have the second one sit idle until the first fails. I know people 
> > want this because I used to configure this for customers when I worked at 
> > UUNET NL. But my thinking is that if you have multiple links, you'll want to 
> > use all of them.
> 
> Real backup is 1:1. If you buy 50+50 megs and use 60 (30+30), if one fails 
> you do not have full backup.
> 
> I can see your case, we have customers with 100 meg connections and 2M 
> backup, with the reasoning that 2M is better than nothing and they do not 
> want to pay 100+100. They're only interested in backupping a subset of 
> their traffic so they ACL the backup to only allow certain traffic in the 
> case of a failure on the primary high-bw link.
> 
> Also, we charge less for the backup if it's not used normally, but that's 
> for two connections into the same ISP, not multihoming between two ISPs.
> 
> -- 
> Mikael Abrahamsson    email: swmike@swm.pp.se
>
Follow-Ups:
- Re: shim6 @ NANOG (forwarded note from John Payne)
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
References:
- Re: shim6 @ NANOG (forwarded note from John Payne)
  - From: Mikael Abrahamsson <swmike@swm.pp.se>
Prev by Date: Re: [Fwd: Review comments on draft-ietf-shim6-proto-03.txt]
Next by Date: Re: shim6 @ NANOG (forwarded note from John Payne)
Previous by thread: Re: shim6 @ NANOG (forwarded note from John Payne)
Next by thread: Re: shim6 @ NANOG (forwarded note from John Payne)
Index(es):
- Date
- Thread