[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Comments on draft-ietf-shim6-proto-00.txt




Hi Erik,

Thanks for writing this. We are beginning to see
a complete shim6 protocol proposal! This document
answered many questions that I have had at least.
Overall, I think the approach is solid. But I did have
a number of questions and comments, see below:

Technical:

Missing a discussion on the relationship of the shim6
processing wrt other processing that is taking place
at the IP layer, including at least IPsec but probably
also Mobile IPv6. I know from past experience that
its quite hard to define the relationships and processing
order correctly, and I suspect that its increasingly hard
for shim6.

Also, what is the relationship of Shim6 processing to
things in the host that depend on literal addresses,
such as IPsec policies?

Another missing discussion: the document refers to
SCTP as if it would be obvious how it can use Shim6. I'm
not sure that's the case. Or at least its not obvious to me :-)

Yet another missing discussion: is there some interaction
with this protocol and the protocol defined in Marcelo's
draft that talked about communication with non-shim6
peers? It would appear that some aspects (e.g. input from
RAs) is common.

And one more: the document is relatively silent on
(un)reachability detection mechanisms beyond shim6-based
probing. We do have ND(NUD), L2, etc. mechanisms that
should be taken into account. If your L2 tells you that its
lost the connection, there's no point in probing at L3,
we need to find another interface!

  multihoming can be provided for IPv6 with failover and load spreading
  properties

I'm a bit concerned that we have not figured out all the details
regarding load spreading. You don't want a particular session
spread around different paths, because doing so would
confuse existing congestion avoidance mechanisms. The obvious
answer appears to be making sure that that we keep the same
locator pair for the same session. But can we identify sessions
in all cases? Also, protocol description in Section 4 and beyond
does not talk about when and how loadsharing is initiated and
abandoned.

  o  Communication continues without any change for the ULP packets.
     In addition, there might be some messages exchanged between the
     shim sub-layers for (un)reachability detection.

  o  At some point in time something fails.  Depending on the approach
     to reachability detection, there might be some advise from the
     ULP, or the shim (un)reachability detection might discover that
     there is a problem.

     At this point in time one or both ends of the communication need
     to explore the different alternate locator pairs until a working
     pair is found, and rehome to using that pair.
Some additional thinking may be needed here wrt. what
goes on in the alternative paths during the first step and
which end does what in the second step. The reason that
I worry about this is the various middleboxes that we may
have.

In an IPv6-only world we don't need to worry about NATs.
However, there may be stateful firewalls that prevent, for
instance, the peer from contacting our other locator since
the firewall may not have seen any traffic to the peer from
our other locator yet.

  o  The shim (un)reachability detection will monitor the new locator
     pair as it monitored the original locator pair, so that subsequent
     failures can be detected.
There's no consideration here for switch-due-to-policy, such as
someone preferring his LAN connection when its present
over wireless connections, regardless of whether the wireless
works or not. Personally, I'm fine with avoiding
policy (it'll never get configured anyway) but perhaps this
is a limitation that should be explicitly stated & discussed.

This is also related to the load sharing support.

  For commonly
  used IP protocols this is done by using a different value in the Flow
  Label field, that is, there is no additional header added to the
  packets.  But for other IP protocol types there is an extra 8 byte
  header inserted, which carries the next header value.

This seems a bit surprising, but I'm probably missing
something. (Postscript: after reading the rest of the
document I now understand what's going on. But there
may be other readers who are left wondering at this
point in the document.)

  In addition, the non-shim6 messages, which we call payload packets,
  will not contain the ULIDs after a failure.  This introduces the
  requirement that the <peer locator, local locator, local context tag>
  MUST uniquely identify the context.  Since the peer's set of locators
  might be dynamic the simplest form of unique allocation of the local
  context tag is to pick a number that is unique on the host.  Hosts
  which serve multiple ULIDs using disjoint sets of locators can
  maintain the context tag allocation per such disjoint set.
Not sure if this always needs to be true.
It might be that the shim6 failover protocol signaling
is used to tell the peer what the new locators are. If
that's the case, then the local context tag alone does
not need to be unique, you could rely also on the
addresses.

Also, there seems to be security issue in using just
the context tag to do the demux. (Or is there some
crypto hash somewhere too?) If I learn or guess
your tag, does that mean that I can start sending
traffic that appears to come from you, even if I
use a different source IP and my host is under
ingress filtering restrictions?

  Whether we overload the flow label field to carry the context tag or
  not, any protocol (such as RSVP or NSIS) which signals information
  about flows from the host stack to devices in the path, need to be
  made aware of the locator agility introduced by a layer 3 shim, so
  that the signaling can be performed for the locator pairs that are
  currently being used.
I'd like to understand this better. Is the information flow
always from the host to the middleboxes and routers?
If yes, then the host can keep its choices about the flow
labels. If not, then we may have a problem.

4.2  Protocol type overloading

  The mechanism for detecting a loss of context state at the peer that
  is currently proposed in this document assumes that the receiver can
  tell the packets that need locator rewriting, even after it has lost
  all state (e.g., due to a crash followed by a reboot).  There is an
  alternative to detection of lost state outlined in Section 18.

  The idea is to steal a partial bit from the protocol type fields that
  are used in the Next Header values, so that the common upper layer
  protocols can be identified.

  For example:

  o  TCP has protocol 6; TCP using alternate locators has protocol P.
In a crash followed by reboot it does not help that shim6
recovers the state. TCP and SCTP state is gone anyway.
For UDP and ICMP shim6 recovery would help.

However, there may be other conditions, such as loss
of state due to premature garbage collection which makes
all this necessary. Perhaps you could just use a different
example above. Or would synchronized state removal
be an option?

  Thus with 7 or so additional protocol field values we can do a
  reasonable job of overloading the flow label field and get detection
  of lost context state.
This is primarily an optimization. I wonder if it would make
sense to limit the optimization to the 80% useful case which
to me would be TCP, ESP, UDP, or even less. No sense in
optimization ICMP messages, I think.

  The peers' lists of locators are normally exchanged as part of the
  context establishment exchange.  But the set of locators might be
  dynamic.  For this reason there is a Locator List Update message and
  acknowledgement.
This appears to require (optional?) CGA. Perhaps this could be stated
earlier on when you talked about HBA.

  The above probe and keepalive messages assume we have an established
  host-pair context.  However, communication might fail during the
  initial context (that is, when the application or transport protocol
  is trying to setup some communication).  If we want the shim to be
  able to optimize discovering a working locator pair in that case, we
  need a mechanism to test the reachability of locators independent of
  some context.  We define a locator pair test message and
  acknowledgement for this purpose, even though it isn't yet clear
  whether we need such a thing.
It isn't clear to me how this would be done. Presumably
its the application that is going through the IPs retrieved
from DNS, not the IP layer. Is this something that we
need to handle?

  o  The CGA PDS might not need to be included in every LLU message.
     If it is associated with the ULID, it is sufficient to exchange it
     once.  Then a HBA-protected LLU would not need anything (it can
     just change the preferences for the locators in any case), and a
     CGA-protected LLU would just need the signature option.
Right. Lets make it so :-)

 |            |                                                     |
 |     10     |                  Reachability Probe                 |
 |            |                                                     |
 |     11     |               Reachability Probe Reply              |
 |            |                                                     |
 |     13     |                      Keepalive                      |
 |            |                                                     |
 |     14     |                  Locator Pair Test                  |
 |            |                                                     |
 |     15     |               Locator Pair Test Reply               |
 |            |                                                     |
 |     16     |             Context Locator pair explore            |
 +------------+-----------------------------------------------------+
It might be possible to combine these functions. But lets
do the individual design first for each, and combine later
if possible.

Also, the assistance from payload packets in the explore
phase is not discussed.

Finally, I hope everyone is OK with the design that is
extremely tightly integrated with IPv6. I know its
in our charter, but I still worry about it since I see
so many things that eventually had to work on both
v6 and v4 (people working on mobile IP now to run on both,
IPsec has done so for a very long time, IP multimedia
systems had to run on v4 too, etc). We won't be able
to do this for Shim6.

Editorial:

      different path through the Internt, hence the path MTU might be

typo

   o  Applications that perform referals, or callbacks using IP

s/referals/referrals/g

   the reponder can verify that the validator it receives back in the I2

typo

   11.  Taredown of the Host Pair Context  . . . . . . . . . . . . .  43

s/taredown/teardown/g

  There is a No Contex error message defined, when a control or payload

s/Contex/Context/g

--Jari