[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: shim6 @ NANOG (forwarded note from John Payne) (fwd)



Hi Igor,

It is very nice from you to give us feedback about this...

i will try to comment some of the issues that you mention below...

El 01/03/2006, a las 10:10, Igor Gashinsky escribió:


1) Most connections to content providers (with the exceptions of
long-lived streaming sessions, but those sessions are fairly "few" per
server) are very short-lived http (think about 15 packets in each
direction including the setup/teardown). Since, shim6 (as designed right now) does not initiate from the first packet(s), it might not take effect
for these short-lived sessions, and therefore will not help in case of
failure, so in effect, *does not work* at all for fast http transactions


First of all i think it is important to remember that the goal of the SHIM6 protocol is to _preserve_ established sessions through outages.

However, there are other tools that are being discussed, that would allow to establish new communication _after_ an outage.

The rationale behind the different tools as i understand it would be something like the following: - If two hosts have long lived sessions (that could be a long tcp session, or many short tcp sessions or a long udp exchange), then it is likely that it is important for them to preserve this session through outages. In addition, since the session is long lived, and as the probability of having an outage affecting the communication raises with the lifetime of the communication, it seems reasonable to try to protect the session. Moreover, as the session is long lived, the amount of packets will be large enough to reduce the effect of the overhead introduced by the shim context establishment - However, if two hosts have a short lived session, like a short TCP connection, the the above conditions are not true. Basically, this means that since the session is short, then the probability of an outage affecting this session during its lifetime is reduced. Moreover, since the session has just been established, and an outage affects it, the assupmtion is that the host will be willing to retry to establish the session again. For this there are mechanisms being proposed in order to allow the hosts to be able to establish new connections n the case that a failure is affecting one of the available addresses. In other words, the rationale here is that since the session is short lived, the host will prefer to take the risk of having to reestablish the session in the case of an outage rather than paying the shim6 overhead in all its communications (when it is likely that no outage will affect them). It should also be noted, that as you mention the patience of the users is quite limited and they are likely to retry if the connection takes too long, which seems in line with the above case for retrying to establish the connection. In addition i would like to point out that because of the time that it may take to reconverge, a BGP based solution for multihoming does not preserves established communication through all the outages, especially when you have anxious users that are willing to hit the reload button.

So the effort for this case imho is putted in enabling the capacity or establishing new sessions after an outage rather than in preserving established connections, do you think this makes any sense to you

1) In order to "fix" #1, shim6 has the potential to put a sizable (over
10%) state penalties on our servers (to service end-sites w/ shim6),
something which is arguably the most painful thing for those servers,
which can translate into millions of dollars of additional hardware, and
many more millions of dollars per year to power/cool that hardware.


Well, the good thing about mechanisms to establish new communications through outages is that they are located in the client only and have no effect in the server


3) While TE has been discussed at length already, but it is something
which is absolutely required for a content provider to deploy shim6. There has been quite a bit of talk about what TE is used for, but it seems that
few people recognize it as a way of expressing "business/financial
policies". For example, in the v4 world, the (multi-homed) end-user maybe visible via both a *paid* Transit path (say UUNET), and a *free* peering
link (say Cogent), and I would wager that most content providers would
choose the free link (even if performance on that link is (not hugely)
worse). That capability all but disappears in the v6 world if the Client
ID was sourced from their UUnet ip address (since that's who they chose
to use for outbound traffic), and the (web) server does not know that
that locator also corresponds to a Cogent IP (which they can reach for
free).

I fail to understand the example the you are presenting here...

are you considering the case where both the client and the server are both multihomed to Cognet and UUnet?
something like

     UUnet
    /     \
   C       S
    \     /
     Cognet

I mean in this case, the selection of the server provider is determined by the server's address not by the client address, right? The server can influence such decision using SRV records in the DNS, but not sure yet if this is the case you are considering



 This change alone would add millions to the bw bills of said
content providers, and well, reduce the likelyhood of adoption of the
protocol by them. Now, if the shim6 init takes place in the 3way
handshake process, then the servers "somewhat" know what all possible
paths to reach that locator are, but then would need some sort of a
policy server telling them who to talk to on what ip, and that's something
which will not simply scale for 100K+ machines.


I am not sure i understand the scaling problem here
Suppose that you are using a DHCP option for distributing the SHIM6 preferences of the RFC3484 policy table, are you saying that DHCP does not scale for 100K+ machines? or is there something else other than DHCP that



4) As has also been discussed before, the initial connect time has to be *very* low. Anything that takes longer then 4-5 seconds the end-users have a funny way of clicking "stop" in their browser, deeming that "X is down,
let me try Y", which is usually not a very acceptable scenario :-) So,
whatever methodology we use to do the initial set-up has to account for
that, and be able to get a connection that is actually starting to do
something in under 2 seconds, along with figuring out which sourceIP and
destIP pairs actually can talk to each other.

As i mentioned above, we are working in other mechanisms than the shim6 protocol itself that can be used for establishing new communication through outages.

you can find some work in this area in

ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-bagnulo-ipv6- rfc3484-update-00.txt

if you have comments, and especially improvements on the ideas of this draft or other ideas of how to tackkle this problem of initial contact, it would be really useful



I hope this gives people some visibility as to what some content providers
think about shim6, and why deploying it is, well, not something that
people will scramble (or very possibly chose) to do, unless those are
addresses. And, yes, everyone understands that it's all about making
trade-offs, but if you make the wrong trade-offs, and not enough people
deploy the protocol, it's simply not going to fly, and people will just go back to de-aggregating in v6 and let Moore's Law deal with the issue (and
anyone who thinks that people will prevent paying customers from
deagregating has not seen how many hoops ISP's will jump through for that extra revenue, or how fast customers will jump to other ISP's which will
allow them to do just that). I don't know if more work on shim6 is the
answer, or GSE/8+8 is a better alterntive, but it sure looks like what we
have in shim6 today (and it's current direction) isn't going to cut it.

Just my $0.02


yes your feedback is very welcome

thanks, marcelo


Thanks,
-igor