[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: shim6 @ NANOG (forwarded note from John Payne)

To: Iljitsch van Beijnum <iljitsch@muada.com>
Subject: Re: shim6 @ NANOG (forwarded note from John Payne)
From: "Jason Schiller (schiller@uu.net)" <jason.schiller@mci.com>
Date: Fri, 24 Feb 2006 13:47:11 -0500 (EST)
Cc: Mikael Abrahamsson <swmike@swm.pp.se>, shim6-wg <shim6@psg.com>
In-reply-to: <065499D4-C911-4A42-AC84-BAF7CBF11D20@muada.com>
I am baffled by the fact that Service Provider Operators have come out in
this forum, at the IAB IPv6 multihoming BOF, and other places, and have
explained how they and their customers use traffic engineering, yet up
until now, shim6 has not tried to provide thier needed functionality.

This is part of the reason more service providers are not envolved in the
IETF.  The other part as KC Claffy points out is cost
http://www.arin.net/meetings/minutes/ARIN_XVI/ppm_minutes_day1.html#anchor_8

So you end up with a working group that is heavily weighted by networks
that provide service to consumer customers.  Maybe there does need to be
two solutions, one for consumers that supports simple fail-over on the end
host.  This makes some sense for consumer networks as the end host
operator, and the network operator is the same person.  Usally the CPE
lacks customer access and usally does not support advanced routing
capabilities.  Neither do consumers want to bother with purchasing an ASN,
or the difficulties of understanding and configuring complicated routing
policies.

On the other hand complex TE preformed at the customer's network level 
is an absolute necessity for business customers.  

Some history...    

1. RFC-3582 attempts to document IPv6 multi-homing requirements.
   
I thought the problem was that this RFC was more of a laundry list of some
of the current IPv6 multi-homing/TE usages, and didn't provide the "basic
building blocks" of multi-homing.  It was later pointed out to mee that
these are not "requirements" but "goals" meaning not needed for a useful
shim6 deployment.

2. I tried to document the basic building block for TE.
-Primary / backup
-Load all links as best as possible
-Use best path
-any combination of these basic building blocks
-additional ability to increase or decrease traffic for any of these

The response I get is do people actully do this?

3. IAB IPv6 multi-homing BOF

It seems to me that Service Provider Operators made a very clear statememt
at the BOF.
-Traffic engineering is needed day 1.
  * Traffic engineering should not be an end host decesion, but an 
    end site (network level) decesion [managing on the end host is 
    the wrong place]
  * Traffic engineering needs to support in-bound and out-bound 
    traffic mamagement
  * Traffic engineering needs to be allowed by transit ASes as well
    as end site ASes [don't leave all ISP TE in the hans of our customers]

-First hit is critical
  * establishing shim6 after the session starts doesn't help 
    short lived sessions
  * Keeping shim6 state on the end host doesn't scale for content
    providers.  A single server may have 30,000 concurrent TCP sessions
    
-Maybe 8+8 / GSE seems to be a better starting point to support transit AS
 TE and to avoid the first hit problem and still allow for an "easy" 
 multi-homing for consumer customers ?

The response sounds to me that shim6 wg is finally interested in
considering decent TE as a "requirement".  Yay!  But I am concerned about
what Operators and IETF folk think is "decent TE", based on past and
current experience:

|Date: Fri, 23 Sep 2005 00:47:22 +0200
|From: Iljitsch van Beijnum <iljitsch@muada.com>
|To: Jason Schiller <jason.schiller@mci.com>
|Cc: shim6-wg <shim6@psg.com>
|Subject: Re: addition of TLV to locator ID or locator ID set
|
|On 22-sep-2005, at 23:23, Jason Schiller (schiller@uu.net) wrote:
|
|>> These days, 85% of the internet is reachable over either two or
|>> three hops. So "best" is largely meaningless here.
|
|> I'm not sure I agree that shortest AS path is mostly uselees
|> because most
|> of the Internet is only a few ASes away.  I think generally people
|> still
|> think its better to transit two ASes instead of three.
|
|What I meant is that there is a group of people for which 85% of the
|world is 2 ASes away and a group for which 85% is 3 ASes away.
|
|Five years ago or so you could do some traffic engineering with path
|prepending, these days it's largely useless because the AS hierarchy
|is so flat.
|
|> BGP has many mechinisms to alter the loading of links.  By default
|> you get
|> shortest AS path in bound.  You can get primary back-up using local
|> pref
|> to different up stream ISPs,
|
|Do people really buy lines just to use them as backups? I never
|understood this.
|
|> you can get primary back up whith a single upstream using MED,
|
|You can still do that with the shim in effect.
|
|> our you can load share by splitting your announcements
|> across you links.
|
|Yuck, you should never announce more specifics for this.

Please beleive the DFZ Service Provider's when the explain how they, and
their customers do TE.  

I do agree that IPv4 style TE through advertising multiple more specifics
to the global routing table will likely not scale in the IPv6
world.  Unfortunately, this is the only tool in the tool box to accomplish
the necessary TE in IPv4.  And IPv6 multi-homing is providing no such tool
to obtain similar results.  This is one of the reasons that operators
are saying shim6 is broken, lets just de-aggregate in IPv6.

If you think "you should never announce more specifics for this" then
provide a better tool to accomplish the TE needs.  But don't just say you
shouldn't do this and remove the functionality.  

|On Fri, 24 Feb 2006, Iljitsch van Beijnum wrote:
|
|> Date: Fri, 24 Feb 2006 12:08:42 +0100
|> From: Iljitsch van Beijnum <iljitsch@muada.com>
|> To: Geoff Huston <gih@apnic.net>
|> Cc: shim6@psg.com
|> Subject: Re: shim6 @ NANOG   (forwarded note from John Payne)
|>
|> On 26-okt-2005, at 18:08, Geoff Huston wrote:
|>
|> > Public thanks to Dave, Geoff, Vijay, Ted and Jason for their
|> > involvement in bringing shim6 to the NANOG conference.
|>
|> I just looked at the stream and I can't say that this made me vary
|> happy. </understatement>
|>
|> Some of the comments made were nothing short of ridiculous, with
|> people claiming authority over the traffic engineering decisions made
|> by their _customers_.

OK, let me try this end site AS / transit AS TE thing again...

Take the picture below where cust1 has connectivity to UUNET and
at&t.  cust2 has connectivity to Sprint and L(3).  UUNET, at&t, Sprint,
and L(3) all peer with each other.

       UUNET---Sprint
      / |   \  /   | \
     /  |    \/    |  \
cust1   |    /\    |   cust2
     \  |   /  \   |  /
      \ |  /    \  | /
       at&t------L(3)

-cust1 pay a flat rate to at&t and per packet to UUNET. 
-cuts1 prefers to use the at&t link as primary (in and out bound)
-cust1 sends BGP comunity 701:80 to UUNET, and UUNET sets a local pref of
 80 on behalf of the customer 

-cust2 has more out bound than in bound traffic.  
-cust2 wants to load share all out bound traffic across both links
-cust2 wants traffic delivered to it over the "best" path

Traffic from cust1 to cust2
---------------------------
1. cust1 will send the traffic to at&t
2. at&t will decide if it is better to deliver traffic to cust2
   via the exit point to L(3) or via the exit point to Sprint
3A. If at&t thinks the Sprint exit is more prefered, then 
    Sprint should deliver traffic to its customer over the 
    Sprint-cust2 link
3B. If at&t thinks the L(3) exit is more prefered, then
    L(3) should deliver traffic to its customer over the
    L(3)-cust2 link

*In this case at&t can do some TE.  Sprint may actully be
 closer or further than L(3), or at&t may  artificially 
 distance or shorten Sprint, or may force certain prefixes
 to prefer Sprint or L(3) [this is usally only the case for
 purchased transit and not peering]

Traffic from cust2 to cust1
---------------------------
1. cust2 will spray traffic to Sprint and at&t
2A. UUNET is not advertising cust1 routes to Peers as
    the best path is learned from a Peer and UUNET does
    not provide transit to Peers.
3A. L(3) and Sprint will forward traffic to at&t
4A. at&t will forward traffic to their customer over the
    at&t-cust1 link

2B. at&t is customers of UUNET instead of a Peer.
    In this case UUNET will advertise the cust1 
    prefic to L(3) and Sprint.
3B. L(3) and Sprint will choose the best exit and
    send the traffic either to at&t or to UUNET
4B. Traffic sent to UUNET will be delivered to at&t as
    UUNET will honor the customer's low local pref community
    Traffic sent to at&t (either from UUNET or L(3) or Sprint)
    will be delivered over the at&t-cust1 link.

*In this second case (B) L(3) and Sprint can choose to send
 the traffic to either UUNET or at&t.  Both L(3) and Sprint
 have some TE capabilites in choosing UUNET or at&t.  UUNET 
 may actually be closer or further than at&t from the L(3)
 or Sprint perspective.  L(3) or Sprint may artificially 
 distance or shorten UUNET or at&t, or may force certain 
 prefixes to prefer UUNET or at&t [this is usally only the
 case for purchased transit and not peering]

In shim6 if cust1 chooses the Sprint IP address as the destination
then all transit ASes must deliver the traffic via Sprint.  Transit 
ASes have no capability to understand the destination lives behind 
both Sprint and L(3), and threfore deliver the traffic to L(3) if
the L(3) exit point is better. 


Transit AS TE is more critical in the case of moderate sized transit AS
that is purchasing transit from multiple upstreams.  Especally when links
are cost prohibative.  Take a large South American ISP that has 16 STM-1s,
where 4xSTM1 use the Americas 2 oceananic cable system to up stream
transit provider1, 4xSTM1 use the Emergia oceananic cable system to up
stream transit provider1, 4xSTM1 use the Americas 2 oceananic cable system
to up stream stream transit provider2, and 4xSTM1 use the Americas 2
oceananic cable system to up stream stream transit provider2.  Now imagine
that your most important customer who always complains about latency
should always use the Americas 2 oceananic cable system to up stream
tranist provider1.  Also imagine all other traffic should load all the
other links as equally as possible.  and given that any one or more links 
fail, all the links should be loaded as equally as possible.  Note: This
is just one example of a real world customer.
    
|On Fri, 24 Feb 2006, Iljitsch van Beijnum wrote:
|> If you have several links to the internet, you don't want one to sit
|> idle while another is congested. So traffic engineering is an
|> essential part of multihoming. Now obviously most ISPs are multihomed
|> themselves so they want to do their own TE. That's completely
|> legitimate. However, that doesn't mean that they get to decide how
|> their customers can do TE. If a customer sends traffic to an ISP and
|> that traffic conforms to SLAs and accepted congestion control
|> methods, the ISP should deliver the packets. That's their business.
|> But some people at the NANOG BOF expressed a different view: "the law
|> of large numbers only works if I control the large numbers". And:
|> "most small businesses can't single home properly". To which a
|> certain ex area director replied: "that doesn't stop them from doing
|> it twice.
|> [...] just like us they're all idiots".

Obvoisuly from my examples above I disagree with your characterization of
how customers and transit ASes actually use inter-AS TE in the real
world.  This is at least as is seen from the perspective of one provider
and its customers.  From what I can tell in talking to the other large
carriers, they mostly do the same, but I certainly can't speak for them.

___Jason
   


==========================================================================
Jason Schiller                                               (703)886.6648
Senior Internet Network Engineer                         fax:(703)886.0512
Public IP Global Network Engineering                       schiller@uu.net
UUNET / Verizon                         jason.schiller@verizonbusiness.com

The good news about having an email address that is twice as long is that
it increases traffic on the Internet.

On Fri, 24 Feb 2006, Iljitsch van Beijnum wrote:

> Date: Fri, 24 Feb 2006 13:33:58 +0100
> From: Iljitsch van Beijnum <iljitsch@muada.com>
> To: Mikael Abrahamsson <swmike@swm.pp.se>
> Cc: shim6@psg.com
> Subject: Re: shim6 @ NANOG   (forwarded note from John Payne)
> 
> On 24-feb-2006, at 12:26, Mikael Abrahamsson wrote:
> 
> >>> 1) Traffic engineering, traffic engineering and traffic engineering
> 
> >> I guess we'll have to take this one to heart. I believe this means  
> >> decent traffic engineering must be in the first version of the specs.
> 
> > I think it should be considered whose problem shim6 is aimed at  
> > solving and whose side shim6 development should be one. ISPs want  
> > to do TE, customers want to work around it (perhaps).
> 
> > In my book, TE is evil and ISPs should be lossless and efficient.  
> > TE is just evil patching.
> 
> In my book, traffic engineering is chapter 6.  :-)
> 
> You can read it online: http://www.oreilly.com/catalog/bgp/chapter/ 
> ch06.html
> 
> If you have several links to the internet, you don't want one to sit  
> idle while another is congested. So traffic engineering is an  
> essential part of multihoming. Now obviously most ISPs are multihomed  
> themselves so they want to do their own TE. That's completely  
> legitimate. However, that doesn't mean that they get to decide how  
> their customers can do TE. If a customer sends traffic to an ISP and  
> that traffic conforms to SLAs and accepted congestion control  
> methods, the ISP should deliver the packets. That's their business.  
> But some people at the NANOG BOF expressed a different view: "the law  
> of large numbers only works if I control the large numbers". And:  
> "most small businesses can't single home properly". To which a  
> certain ex area director replied: "that doesn't stop them from doing  
> it twice.
> [...] just like us they're all idiots".
> 
> Unfortunately, these days, many people take TE to mean "break up my  
> portable address block in small parts". That is one way to do it, and  
> an effective one, but also the least scalable one.
> 
> > I like the fact that shim6 discourages end users from getting their  
> > own address space and AS number just because of redundancy and  
> > multihoming, and I definately like the fact that I will be able to  
> > move between IPs due to mobility, and keep my ssh session up during  
> > this move. I really hate shutting down my ssh sessions just because  
> > I go from wireless to wired mode. Without "screen" I would hate it  
> > even more.
> 
> Well, shim6 isn't really mobility, although you could probably do  
> this with it.
> 
> > This has to take precedence over some ISPs desire to do TE. Perhaps  
> > if their routers could be smaller and simpler due to smaller TCAM  
> > space need, they could afford to upgrade their links instead of  
> > doing a lot of TE to work around the bw problem. Their job is to  
> > move the packets the customer sends to them, and shim6 is an  
> > enduser feature and they shouldn't bother about it.
> 
> Amen to that.
> 
> And I hope they're not using TCAMs for the (entire) IP routing table,  
> because these suckers use lots of power and scale linearly.
>
Follow-Ups:
- Re: shim6 @ NANOG (forwarded note from John Payne)
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
References:
- Re: shim6 @ NANOG (forwarded note from John Payne)
  - From: Iljitsch van Beijnum <iljitsch@muada.com>
Prev by Date: Re: shim6 @ NANOG (forwarded note from John Payne)
Next by Date: Re: shim6 @ NANOG (forwarded note from John Payne)
Previous by thread: Re: shim6 @ NANOG (forwarded note from John Payne)
Next by thread: Re: shim6 @ NANOG (forwarded note from John Payne)
Index(es):
- Date
- Thread