[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: Re: Distribution CPG Protocol - Some Thoughts



At 11:43 AM 2000-12-28 -0500, Oliver Spatscheck wrote:
>Stephen Thomas writes:
>  >
>  > On the assumption that the WG goes forward, here are some initial thoughts
>  > on protocols. Some (most?) of this is possibly obvious, or perhaps some
>  > (most?) is brain-damaged. I'm interested to hear either way.
>  >
>  >
>  > Distribution CPG Protocol. This has been likened to BGP several times, so
>  > it seems like a good place to start is looking at what BGP offers (and 
> what
>  > it doesn't) that appear to be relevant to CDNs.
>  >
>  > First, BGP is an advertising protocol. BGP peers advertise autonomous
>  > system paths that reach CIDR IP subnets. In our case (again, thinking only
>  > of distribution), two options are available. Distribution CPGs could
>  > advertise surrogates, or they could advertise content. If DCPGs advertise
>  > surrogates, it would be up to the recipient CPG to arrange to have content
>  > pushed to the surrogates. Alternatively, if DCPGs advertised content, then
>  > it would be up to the recipient to arrange to have its surrogates pull 
> that
>  > content. I suppose there's no technical reason to limit the protocol 
> to one
>  > of these options, but, in the interest of schedule and focus, at least
>  > picking one to start with seems preferable. My own instinct is that
>  > advertising content works better. There's sort of a one-to-many
>  > relationship (one content to many surrogates) that makes it more natural.
>  > (In the reverse, you have to worry about different content providers
>  > contending for the same advertised surrogate space.)
>  >
>  > Proposal 1: The protocol should advertise the availability of content.
>
>
>Actually I like more the model of advertising surrogates. Advertising content
>alone prevents the owner of making the decision of which surrogates to use

I think you can look at this either way. If a content owner doesn't like 
the surrogates operated by a particular CDN, it just makes sure that none 
of its advertisements are sent to that CDN.

>  and
>breaks the model in which the origin of the content keeps full control of the
>content for accounting, invalidation, QofA and security.... .

This I don't follow. Once content makes its way to a surrogate, the 
potential always exists for the surrogate not to honor the origin's wishes 
with respect to accounting, etc. It doesn't matter which party advertised 
or which initiated the transfer from origin to surrogate. I suppose there's 
some argument that says an origin could maintain a list of "bad surrogates" 
that don't do the right thing (kind of like a bad check list). But again, 
that works whether you advertise content or advertise surrogates.

>And why do you
>have to push content if you select surrogates? You only have to let the
>surrogate know where the content is if need be.

That's exactly what I call advertising content.

>  To pick up your argument there
>is kind of a one-to-many relationship here too.  (one surrogate keeps many
>content) .... .

Again (at least as I read this), you're actually seconding my point. Let me 
try to be more explicit. The scarce resource is the surrogate; it's only 
got so much cache space. If the protocol advertises surrogates, it might 
say something like "cache A has 100 GB available." Now, what happens if 50 
different content providers, with an average content size of 50 GB, all 
decide they'd like to use that surrogate. Clearly there's a resource 
contention issue since all of the content won't fit. The problem isn't 
necessarily trivial, since the providers would probably argue that the 
surrogate should honor requests first-come-first-served. The surrogate, on 
the other hand, would probably prefer some sort of best fit approach. The 
fact that the different parties probably don't agree on the "fairest" 
resolution algorithm, combined with the fact that this is a distributed 
environment, definitely presents some challenges. Now I don't think the 
problem is unsolvable, but consider how much simpler the alternative is.

If providers advertise content, the protocol might say something like 
"origin B has 50 GB of content". Now in the case of resource contention, 
the resolution is strictly local. The surrogate picks and chooses from the 
content that it wishes to cache. Sure, it's probably going to pick some 
best-fit strategy (so providers that want first-come-first-serve lose 
anyway), but look at the difference. In the first case (advertising 
surrogates), the provider gets left with a bad taste in its mouth. (Hey, 
that stupid surrogate said it had space available, but when I tried to use 
it I was refused!) In the second case (advertising content), the provider 
probably never even knows why it wasn't chosen. (I advertised, but the 
surrogate decided not to pick up my content; I guess I'd better advertise 
elsewhere.)

Of course, if it turns out that content is the scarce resource, then this 
whole argument breaks down completely. But then if that's the case, a 
significant number of folks on this list have more important things to 
worry about ;^)

>Maybe the protocol should advertise both, but in contrast to BGP the
>advertisements will actually change if somebody uses it. For example,
>if a content provider uses a particular surrogate the surrogate might
>stop advertising. I guess what I try to say is that in the CDN
>peering environment in contrast to BGP the protocol has to be load
>sensitive to meet the SLAs a CDN gave to a origin site.

If you're advertising surrogates, then indeed the protocol does have to be 
"load sensitive." Once a provider takes advantage of advertised cache 
space, the advertising surrogate needs to adjust its advertisement. On the 
other hand, if you're advertising content, this complication may disappear. 
(It depends on whether a content provider is actively monitoring how much 
of its content is on surrogates, and, if so, it has some kind of cut-off. 
"Okay, now I know my content's on 78 cache servers, so I can quit 
advertising.") Since the load sensitive aspect might go away with 
advertising content (but it will never go away with advertising 
surrogates), simplicity argues for advertising content rather than surrogates.

>  > Third, BGP assigns paths to its advertised objects. When a server learns a
>  > new destination from a peer it adds its own AS to the path on subsequent
>  > re-advertisements. In the simplest case, I don't think paths are relevant
>  > to DCPGs. Say CDN-1 advertises content to CDN-2, who, in turn,
>  > re-advertises it to CDN-3. Now, if CDN-3's surrogates want to retrieve 
> that
>  > content, is there any need to go through CDN-2? Simplest case says "No";
>  > we're on the Internet after all. CDN-3's surrogates can go right to the
>  > content's source. I can think of two complications that might affect this,
>  > though. First is security. Suppose CDN-1 only trusts CDN-2; maybe it's
>  > never even heard of CDN-3. In that case, perhaps it wouldn't want CDN-3's
>  > surrogates grabbing its content. The second complication might be
>  > accounting/billing. Perhaps CDN-2 expects to get paid for relaying the
>  > advertisements from 1 to 3, and maybe it needs to know when that 
> content is
>  > actually retrieved in order account/bill appropriately. In the interest of
>  > simplicity, I'd propose to reject both of those arguments. For the case of
>  > security, CDN-1 can construct its advertisements in such a way that when a
>  > CDN-3 surrogate retrieves the content, the retrieval request 
> identifies the
>  > advertisement. (For example, CDN-1 could embed the identity of CDN-2 
> in the
>  > URL.) The billing argument might be more persuasive, but I'm loathe to add
>  > complexity to any protocol solely to accommodate billing. And finally, 
> it's
>  > quite possible that CDN-2 could implicitly achieve the same effect as a
>  > path by advertising the content as its own and then proxying to the real
>  > content.
>  >
>  > Proposal 3: The protocol need not explicitly support the notion of paths.
>
>
>I also disagree with this point. One of the main features of paths in BGP is
>the detection of routing loops (routing itself depends more heavily
>on policy ....). We have a similar problem. We have to eliminate advertisment
>loops. It is also a good debugging tool. Debugging problems in peered
>CDNs is one of the main challanges of CDN peering.

Actually, path-vectors a la BGP are one of the least efficient (in terms of 
bandwidth, storage, and computation) ways to detect loops. Both RIP 
(distance-vector) and OSPF (link-state with reliable flooding) avoid loops 
quite nicely without paths. Of course, there are always trade-offs, and we 
might decide that we like path-vector better than the alternatives.

Stephen



____________________________________________________________________
Stephen Thomas                                       +1 770 671 1888
TransNexus, Chief Technical Officer    stephen.thomas@transnexus.com