[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comments on draft-ietf-shim6-proto-00.txt



Hi Jari,


my p.o.v. w.r.t. the comments you made
El 05/10/2005, a las 14:29, Jari Arkko escribió:


Hi Erik,

Thanks for writing this. We are beginning to see
a complete shim6 protocol proposal! This document
answered many questions that I have had at least.
Overall, I think the approach is solid. But I did have
a number of questions and comments, see below:

Technical:

Missing a discussion on the relationship of the shim6
processing wrt other processing that is taking place
at the IP layer, including at least IPsec but probably
also Mobile IPv6. I know from past experience that
its quite hard to define the relationships and processing
order correctly, and I suspect that its increasingly hard
for shim6.

Also, what is the relationship of Shim6 processing to
things in the host that depend on literal addresses,
such as IPsec policies?

Another missing discussion: the document refers to
SCTP as if it would be obvious how it can use Shim6. I'm
not sure that's the case. Or at least its not obvious to me :-)


basically, as i read it, the point would be to include a section about shim interaction with other potentially affected protocols, such as MIP, IPSec, SCTP

I think this makes sense

Yet another missing discussion: is there some interaction
with this protocol and the protocol defined in Marcelo's
draft that talked about communication with non-shim6
peers? It would appear that some aspects (e.g. input from
RAs) is common.

As i understand it, we have two cases:
- a host within the multihomed site that wants to initiate a new communication with an external legacy host after an outage, and - a host within the multihomed site that wants to initiate a new communication with an external shim host after an outage

In the first case the mechanisms for doing this must only reside in the multihomed host, while in the second case, the mechanism can involve both (the multihomed and the external) hosts

Clearly the mechanisms used for the first case can be used for the second case as well (obviously the mechanisms for the second and not suitable for the first case)

The question is whether it wouldn't be better to define mechanisms that are specific for the second case, taking advantage of the shim capabilities of the shim peer.

The benefit that i can think of in this case would be that retrial would be transparent for the ULP, since it would be solved by the shim. The drawback is that the shim session needs to be established before the communication is initiated, which may not always be the case. An additional drawback is the added complexity, that need to quantify, since these shim based mechanisms to deal with non working ULIDs are in addition to the mechanisms for dealing with legacy hosts. Additional difficulties for this mechanisms is the discovery of alternative locators for initiating the communication, which is likely to be based on information posted in the DNS, but that information is probably not as good as it may seem, since the addresses associated with a given FQDN may not belong to a single host, resulting in invalid ULID/locator combinations.

As a side note, these mechanisms for dealing with non reachable ULIDs are needed for supporting things like ULAs as ULIDs.




And one more: the document is relatively silent on
(un)reachability detection mechanisms beyond shim6-based
probing. We do have ND(NUD), L2, etc. mechanisms that
should be taken into account. If your L2 tells you that its
lost the connection, there's no point in probing at L3,
we need to find another interface!

multihoming can be provided for IPv6 with failover and load spreading
  properties

I'm a bit concerned that we have not figured out all the details
regarding load spreading. You don't want a particular session
spread around different paths, because doing so would
confuse existing congestion avoidance mechanisms. The obvious
answer appears to be making sure that that we keep the same
locator pair for the same session. But can we identify sessions
in all cases? Also, protocol description in Section 4 and beyond
does not talk about when and how loadsharing is initiated and
abandoned.


I am not sure what do you mean by load sharing...
If we are talking about distributing the traffic across the multiple exit paths of the multihomed site, i would say that this would be naturally provided by the selection of ULIDs by the applications. I mean, just the fact that different hosts select different prefixes for initiating communications would result in a distribution of the load across the different ISPs. I mean that there is no point to distribute the packets of a given communication among the different paths, since what really matters AFAICT is the overall load, which is the aggregation of the load of all nodes, which will be distributed because different communications will use different prefixes hence different ISPs (as opposed to a single communication distributing the load among multiple ISPs)

However, this is clearly not good enough for achieving some form of traffic engineering or policing where something more that spreading the load evenly across isps is required.

In any case, i think it would make sense to explain this in the doc

  o  Communication continues without any change for the ULP packets.
     In addition, there might be some messages exchanged between the
     shim sub-layers for (un)reachability detection.

  o  At some point in time something fails.  Depending on the approach
     to reachability detection, there might be some advise from the
     ULP, or the shim (un)reachability detection might discover that
     there is a problem.

     At this point in time one or both ends of the communication need
     to explore the different alternate locator pairs until a working
     pair is found, and rehome to using that pair.

Some additional thinking may be needed here wrt. what
goes on in the alternative paths during the first step and
which end does what in the second step.The reason that
I worry about this is the various middleboxes that we may
have.

In an IPv6-only world we don't need to worry about NATs.
However, there may be stateful firewalls that prevent, for
instance, the peer from contacting our other locator since
the firewall may not have seen any traffic to the peer from
our other locator yet.


  o  The shim (un)reachability detection will monitor the new locator
pair as it monitored the original locator pair, so that subsequent
     failures can be detected.

There's no consideration here for switch-due-to-policy, such as
someone preferring his LAN connection when its present
over wireless connections, regardless of whether the wireless
works or not. Personally, I'm fine with avoiding
policy (it'll never get configured anyway) but perhaps this
is a limitation that should be explicitly stated & discussed.


i think it may be good to support this configuration...

I mean, this would be also the case of primary/backup configuration where the backup link has worse performance than the primary, so the primary is preferred as soon as it is available

This is also related to the load sharing support.

  For commonly
used IP protocols this is done by using a different value in the Flow
  Label field, that is, there is no additional header added to the
  packets.  But for other IP protocol types there is an extra 8 byte
  header inserted, which carries the next header value.

This seems a bit surprising, but I'm probably missing
something. (Postscript: after reading the rest of the
document I now understand what's going on. But there
may be other readers who are left wondering at this
point in the document.)

  In addition, the non-shim6 messages, which we call payload packets,
  will not contain the ULIDs after a failure.  This introduces the
requirement that the <peer locator, local locator, local context tag> MUST uniquely identify the context. Since the peer's set of locators
  might be dynamic the simplest form of unique allocation of the local
  context tag is to pick a number that is unique on the host.  Hosts
  which serve multiple ULIDs using disjoint sets of locators can
  maintain the context tag allocation per such disjoint set.

Not sure if this always needs to be true.
It might be that the shim6 failover protocol signaling
is used to tell the peer what the new locators are. If
that's the case, then the local context tag alone does
not need to be unique, you could rely also on the
addresses.

Also, there seems to be security issue in using just
the context tag to do the demux.

as i understand it, once that the context is identified using the context tag, the addresses included in the packet are verified against the locator set available for that context. If the addresses included in the packet are not included in the locator set of the context, the packet will be ignored

As i see it the unique context tag is needed not because we have unknown addresses than can be used as locators, but because those addresses may be inlcuded in more than one context. So if both contextA and contextB have Address1 and Address2 as valid locators (for each endpoint) then when a packet using Address! and Adress2 is received, the hosts cannot tell which context (A or B) to use to perform the demux

Both Address1 and Address2 must have been validated and are included explicitly as locators in the locators set of each context.

 (Or is there some
crypto hash somewhere too?) If I learn or guess
your tag, does that mean that I can start sending
traffic that appears to come from you, even if I
use a different source IP and my host is under
ingress filtering restrictions?


i don't think so, see above


....

  The peers' lists of locators are normally exchanged as part of the
  context establishment exchange.  But the set of locators might be
  dynamic.  For this reason there is a Locator List Update message and
  acknowledgement.

This appears to require (optional?) CGA.

right

 Perhaps this could be stated
earlier on when you talked about HBA.

  The above probe and keepalive messages assume we have an established
  host-pair context.  However, communication might fail during the
  initial context (that is, when the application or transport protocol
  is trying to setup some communication).  If we want the shim to be
  able to optimize discovering a working locator pair in that case, we
  need a mechanism to test the reachability of locators independent of
  some context.  We define a locator pair test message and
  acknowledgement for this purpose, even though it isn't yet clear
  whether we need such a thing.

It isn't clear to me how this would be done. Presumably
its the application that is going through the IPs retrieved
from DNS, not the IP layer.

right

 Is this something that we
need to handle?


i guess that is under discussion if it is worthy, see above...

....
It might be possible to combine these functions. But lets
do the individual design first for each, and combine later
if possible.

Also, the assistance from payload packets in the explore
phase is not discussed.

i am missing what you mean here... could you expand?

regards, marcelo