[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: shim6 @ NANOG (forwarded note from John Payne) (fwd)

To: Igor Gashinsky <igor@gashinsky.net>
Subject: Re: shim6 @ NANOG (forwarded note from John Payne) (fwd)
From: marcelo bagnulo braun <marcelo@it.uc3m.es>
Date: Thu, 2 Mar 2006 11:05:42 +0200
Cc: shim6-wg <shim6@psg.com>
In-reply-to: <Pine.LNX.4.60.0602280209190.28719@moonbase.nullrouteit.net>
References: <Pine.GSO.4.20.0602270202480.18350-100000@meno.corp.us.uu.net> <Pine.LNX.4.60.0602280209190.28719@moonbase.nullrouteit.net>

Hi Igor,

It is very nice from you to give us feedback about this...

i will try to comment some of the issues that you mention below...

El 01/03/2006, a las 10:10, Igor Gashinsky escribió:

1) Most connections to content providers (with the exceptions of
long-lived streaming sessions, but those sessions are fairly "few" per
server) are very short-lived http (think about 15 packets in each
direction including the setup/teardown). Since, shim6 (as designedrightnow) does not initiate from the first packet(s), it might not takeeffect
for these short-lived sessions, and therefore will not help in case of
failure, so in effect, *does not work* at all for fast httptransactions

First of all i think it is important to remember that the goal of theSHIM6 protocol is to _preserve_ established sessions through outages.

However, there are other tools that are being discussed, that wouldallow to establish new communication _after_ an outage.

The rationale behind the different tools as i understand it would besomething like the following:- If two hosts have long lived sessions (that could be a long tcpsession, or many short tcp sessions or a long udp exchange), then it islikely that it is important for them to preserve this session throughoutages. In addition, since the session is long lived, and as theprobability of having an outage affecting the communication raises withthe lifetime of the communication, it seems reasonable to try toprotect the session. Moreover, as the session is long lived, the amountof packets will be large enough to reduce the effect of the overheadintroduced by the shim context establishment- However, if two hosts have a short lived session, like a short TCPconnection, the the above conditions are not true. Basically, thismeans that since the session is short, then the probability of anoutage affecting this session during its lifetime is reduced. Moreover,since the session has just been established, and an outage affects it,the assupmtion is that the host will be willing to retry to establishthe session again. For this there are mechanisms being proposed inorder to allow the hosts to be able to establish new connections n thecase that a failure is affecting one of the available addresses. Inother words, the rationale here is that since the session is shortlived, the host will prefer to take the risk of having to reestablishthe session in the case of an outage rather than paying the shim6overhead in all its communications (when it is likely that no outagewill affect them). It should also be noted, that as you mention thepatience of the users is quite limited and they are likely to retry ifthe connection takes too long, which seems in line with the above casefor retrying to establish the connection. In addition i would like topoint out that because of the time that it may take to reconverge, aBGP based solution for multihoming does not preserves establishedcommunication through all the outages, especially when you have anxioususers that are willing to hit the reload button.

So the effort for this case imho is putted in enabling the capacity orestablishing new sessions after an outage rather than in preservingestablished connections, do you think this makes any sense to you

1) In order to "fix" #1, shim6 has the potential to put a sizable (over
10%) state penalties on our servers (to service end-sites w/ shim6),
something which is arguably the most painful thing for those servers,

which can translate into millions of dollars of additional hardware,and

many more millions of dollars per year to power/cool that hardware.

Well, the good thing about mechanisms to establish new communicationsthrough outages is that they are located in the client only and have noeffect in the server

3) While TE has been discussed at length already, but it is something
which is absolutely required for a content provider to deploy shim6.Therehas been quite a bit of talk about what TE is used for, but it seemsthat
few people recognize it as a way of expressing "business/financial
policies". For example, in the v4 world, the (multi-homed) end-usermaybevisible via both a *paid* Transit path (say UUNET), and a *free*peering
link (say Cogent), and I would wager that most content providers would
choose the free link (even if performance on that link is (not hugely)
worse). That capability all but disappears in the v6 world if theClient
ID was sourced from their UUnet ip address (since that's who they chose
to use for outbound traffic), and the (web) server does not know that
that locator also corresponds to a Cogent IP (which they can reach for
free).

I fail to understand the example the you are presenting here...

are you considering the case where both the client and the server areboth multihomed to Cognet and UUnet?

something like

     UUnet
    /     \
   C       S
    \     /
     Cognet

I mean in this case, the selection of the server provider is determinedby the server's address not by the client address, right?The server can influence such decision using SRV records in the DNS,but not sure yet if this is the case you are considering

 This change alone would add millions to the bw bills of said
content providers, and well, reduce the likelyhood of adoption of the
protocol by them. Now, if the shim6 init takes place in the 3way
handshake process, then the servers "somewhat" know what all possible
paths to reach that locator are, but then would need some sort of a

policy server telling them who to talk to on what ip, and that'ssomething

which will not simply scale for 100K+ machines.

I am not sure i understand the scaling problem here

Suppose that you are using a DHCP option for distributing the SHIM6preferences of the RFC3484 policy table, are you saying that DHCP doesnot scale for 100K+ machines? or is there something else other thanDHCP that

4) As has also been discussed before, the initial connect time has tobe*very* low. Anything that takes longer then 4-5 seconds the end-usershavea funny way of clicking "stop" in their browser, deeming that "X isdown,
let me try Y", which is usually not a very acceptable scenario :-) So,
whatever methodology we use to do the initial set-up has to account for
that, and be able to get a connection that is actually starting to do
something in under 2 seconds, along with figuring out which sourceIPand
destIP pairs actually can talk to each other.

As i mentioned above, we are working in other mechanisms than the shim6protocol itself that can be used for establishing new communicationthrough outages.

you can find some work in this area in

ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-bagnulo-ipv6-rfc3484-update-00.txt

if you have comments, and especially improvements on the ideas of thisdraft or other ideas of how to tackkle this problem of initial contact,it would be really useful

I hope this gives people some visibility as to what some contentproviders
think about shim6, and why deploying it is, well, not something that
people will scramble (or very possibly chose) to do, unless those are
addresses. And, yes, everyone understands that it's all about making
trade-offs, but if you make the wrong trade-offs, and not enough people
deploy the protocol, it's simply not going to fly, and people willjust goback to de-aggregating in v6 and let Moore's Law deal with the issue(and
anyone who thinks that people will prevent paying customers from
deagregating has not seen how many hoops ISP's will jump through forthatextra revenue, or how fast customers will jump to other ISP's whichwill
allow them to do just that). I don't know if more work on shim6 is the
answer, or GSE/8+8 is a better alterntive, but it sure looks like whatwe
have in shim6 today (and it's current direction) isn't going to cut it.

Just my $0.02

yes your feedback is very welcome

thanks, marcelo

Thanks,
-igor

Follow-Ups:
- Re: shim6 @ NANOG (forwarded note from John Payne) (fwd)
  - From: Igor Gashinsky <igor@gashinsky.net>

References:
- Re: shim6 @ NANOG (forwarded note from John Payne) (fwd)
  - From: Igor Gashinsky <igor@gashinsky.net>

Prev by Date: Re: shim6 @ NANOG (forwarded note from John Payne) (fwd)
Next by Date: Re: Defining and updating CT(peer)
Previous by thread: Re: shim6 @ NANOG (forwarded note from John Payne) (fwd)
Next by thread: Re: shim6 @ NANOG (forwarded note from John Payne) (fwd)
Index(es):
- Date
- Thread