[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: shim6 @ NANOG (forwarded note from John Payne) (fwd)
Hi Igor,
It is very nice from you to give us feedback about this...
i will try to comment some of the issues that you mention below...
El 01/03/2006, a las 10:10, Igor Gashinsky escribió:
1) Most connections to content providers (with the exceptions of
long-lived streaming sessions, but those sessions are fairly "few" per
server) are very short-lived http (think about 15 packets in each
direction including the setup/teardown). Since, shim6 (as designed
right
now) does not initiate from the first packet(s), it might not take
effect
for these short-lived sessions, and therefore will not help in case of
failure, so in effect, *does not work* at all for fast http
transactions
First of all i think it is important to remember that the goal of the
SHIM6 protocol is to _preserve_ established sessions through outages.
However, there are other tools that are being discussed, that would
allow to establish new communication _after_ an outage.
The rationale behind the different tools as i understand it would be
something like the following:
- If two hosts have long lived sessions (that could be a long tcp
session, or many short tcp sessions or a long udp exchange), then it is
likely that it is important for them to preserve this session through
outages. In addition, since the session is long lived, and as the
probability of having an outage affecting the communication raises with
the lifetime of the communication, it seems reasonable to try to
protect the session. Moreover, as the session is long lived, the amount
of packets will be large enough to reduce the effect of the overhead
introduced by the shim context establishment
- However, if two hosts have a short lived session, like a short TCP
connection, the the above conditions are not true. Basically, this
means that since the session is short, then the probability of an
outage affecting this session during its lifetime is reduced. Moreover,
since the session has just been established, and an outage affects it,
the assupmtion is that the host will be willing to retry to establish
the session again. For this there are mechanisms being proposed in
order to allow the hosts to be able to establish new connections n the
case that a failure is affecting one of the available addresses. In
other words, the rationale here is that since the session is short
lived, the host will prefer to take the risk of having to reestablish
the session in the case of an outage rather than paying the shim6
overhead in all its communications (when it is likely that no outage
will affect them). It should also be noted, that as you mention the
patience of the users is quite limited and they are likely to retry if
the connection takes too long, which seems in line with the above case
for retrying to establish the connection. In addition i would like to
point out that because of the time that it may take to reconverge, a
BGP based solution for multihoming does not preserves established
communication through all the outages, especially when you have anxious
users that are willing to hit the reload button.
So the effort for this case imho is putted in enabling the capacity or
establishing new sessions after an outage rather than in preserving
established connections, do you think this makes any sense to you
1) In order to "fix" #1, shim6 has the potential to put a sizable (over
10%) state penalties on our servers (to service end-sites w/ shim6),
something which is arguably the most painful thing for those servers,
which can translate into millions of dollars of additional hardware,
and
many more millions of dollars per year to power/cool that hardware.
Well, the good thing about mechanisms to establish new communications
through outages is that they are located in the client only and have no
effect in the server
3) While TE has been discussed at length already, but it is something
which is absolutely required for a content provider to deploy shim6.
There
has been quite a bit of talk about what TE is used for, but it seems
that
few people recognize it as a way of expressing "business/financial
policies". For example, in the v4 world, the (multi-homed) end-user
maybe
visible via both a *paid* Transit path (say UUNET), and a *free*
peering
link (say Cogent), and I would wager that most content providers would
choose the free link (even if performance on that link is (not hugely)
worse). That capability all but disappears in the v6 world if the
Client
ID was sourced from their UUnet ip address (since that's who they chose
to use for outbound traffic), and the (web) server does not know that
that locator also corresponds to a Cogent IP (which they can reach for
free).
I fail to understand the example the you are presenting here...
are you considering the case where both the client and the server are
both multihomed to Cognet and UUnet?
something like
UUnet
/ \
C S
\ /
Cognet
I mean in this case, the selection of the server provider is determined
by the server's address not by the client address, right?
The server can influence such decision using SRV records in the DNS,
but not sure yet if this is the case you are considering
This change alone would add millions to the bw bills of said
content providers, and well, reduce the likelyhood of adoption of the
protocol by them. Now, if the shim6 init takes place in the 3way
handshake process, then the servers "somewhat" know what all possible
paths to reach that locator are, but then would need some sort of a
policy server telling them who to talk to on what ip, and that's
something
which will not simply scale for 100K+ machines.
I am not sure i understand the scaling problem here
Suppose that you are using a DHCP option for distributing the SHIM6
preferences of the RFC3484 policy table, are you saying that DHCP does
not scale for 100K+ machines? or is there something else other than
DHCP that
4) As has also been discussed before, the initial connect time has to
be
*very* low. Anything that takes longer then 4-5 seconds the end-users
have
a funny way of clicking "stop" in their browser, deeming that "X is
down,
let me try Y", which is usually not a very acceptable scenario :-) So,
whatever methodology we use to do the initial set-up has to account for
that, and be able to get a connection that is actually starting to do
something in under 2 seconds, along with figuring out which sourceIP
and
destIP pairs actually can talk to each other.
As i mentioned above, we are working in other mechanisms than the shim6
protocol itself that can be used for establishing new communication
through outages.
you can find some work in this area in
ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-bagnulo-ipv6-
rfc3484-update-00.txt
if you have comments, and especially improvements on the ideas of this
draft or other ideas of how to tackkle this problem of initial contact,
it would be really useful
I hope this gives people some visibility as to what some content
providers
think about shim6, and why deploying it is, well, not something that
people will scramble (or very possibly chose) to do, unless those are
addresses. And, yes, everyone understands that it's all about making
trade-offs, but if you make the wrong trade-offs, and not enough people
deploy the protocol, it's simply not going to fly, and people will
just go
back to de-aggregating in v6 and let Moore's Law deal with the issue
(and
anyone who thinks that people will prevent paying customers from
deagregating has not seen how many hoops ISP's will jump through for
that
extra revenue, or how fast customers will jump to other ISP's which
will
allow them to do just that). I don't know if more work on shim6 is the
answer, or GSE/8+8 is a better alterntive, but it sure looks like what
we
have in shim6 today (and it's current direction) isn't going to cut it.
Just my $0.02
yes your feedback is very welcome
thanks, marcelo
Thanks,
-igor