[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Comments on draft-ietf-shim6-proto-00.txt
Hi Jari,
my p.o.v. w.r.t. the comments you made
El 05/10/2005, a las 14:29, Jari Arkko escribió:
Hi Erik,
Thanks for writing this. We are beginning to see
a complete shim6 protocol proposal! This document
answered many questions that I have had at least.
Overall, I think the approach is solid. But I did have
a number of questions and comments, see below:
Technical:
Missing a discussion on the relationship of the shim6
processing wrt other processing that is taking place
at the IP layer, including at least IPsec but probably
also Mobile IPv6. I know from past experience that
its quite hard to define the relationships and processing
order correctly, and I suspect that its increasingly hard
for shim6.
Also, what is the relationship of Shim6 processing to
things in the host that depend on literal addresses,
such as IPsec policies?
Another missing discussion: the document refers to
SCTP as if it would be obvious how it can use Shim6. I'm
not sure that's the case. Or at least its not obvious to me :-)
basically, as i read it, the point would be to include a section about
shim interaction with other potentially affected protocols, such as
MIP, IPSec, SCTP
I think this makes sense
Yet another missing discussion: is there some interaction
with this protocol and the protocol defined in Marcelo's
draft that talked about communication with non-shim6
peers? It would appear that some aspects (e.g. input from
RAs) is common.
As i understand it, we have two cases:
- a host within the multihomed site that wants to initiate a new
communication with an external legacy host after an outage, and
- a host within the multihomed site that wants to initiate a new
communication with an external shim host after an outage
In the first case the mechanisms for doing this must only reside in the
multihomed host, while in the second case, the mechanism can involve
both (the multihomed and the external) hosts
Clearly the mechanisms used for the first case can be used for the
second case as well (obviously the mechanisms for the second and not
suitable for the first case)
The question is whether it wouldn't be better to define mechanisms that
are specific for the second case, taking advantage of the shim
capabilities of the shim peer.
The benefit that i can think of in this case would be that retrial
would be transparent for the ULP, since it would be solved by the shim.
The drawback is that the shim session needs to be established before
the communication is initiated, which may not always be the case. An
additional drawback is the added complexity, that need to quantify,
since these shim based mechanisms to deal with non working ULIDs are in
addition to the mechanisms for dealing with legacy hosts. Additional
difficulties for this mechanisms is the discovery of alternative
locators for initiating the communication, which is likely to be based
on information posted in the DNS, but that information is probably not
as good as it may seem, since the addresses associated with a given
FQDN may not belong to a single host, resulting in invalid ULID/locator
combinations.
As a side note, these mechanisms for dealing with non reachable ULIDs
are needed for supporting things like ULAs as ULIDs.
And one more: the document is relatively silent on
(un)reachability detection mechanisms beyond shim6-based
probing. We do have ND(NUD), L2, etc. mechanisms that
should be taken into account. If your L2 tells you that its
lost the connection, there's no point in probing at L3,
we need to find another interface!
multihoming can be provided for IPv6 with failover and load
spreading
properties
I'm a bit concerned that we have not figured out all the details
regarding load spreading. You don't want a particular session
spread around different paths, because doing so would
confuse existing congestion avoidance mechanisms. The obvious
answer appears to be making sure that that we keep the same
locator pair for the same session. But can we identify sessions
in all cases? Also, protocol description in Section 4 and beyond
does not talk about when and how loadsharing is initiated and
abandoned.
I am not sure what do you mean by load sharing...
If we are talking about distributing the traffic across the multiple
exit paths of the multihomed site, i would say that this would be
naturally provided by the selection of ULIDs by the applications. I
mean, just the fact that different hosts select different prefixes for
initiating communications would result in a distribution of the load
across the different ISPs.
I mean that there is no point to distribute the packets of a given
communication among the different paths, since what really matters
AFAICT is the overall load, which is the aggregation of the load of all
nodes, which will be distributed because different communications will
use different prefixes hence different ISPs (as opposed to a single
communication distributing the load among multiple ISPs)
However, this is clearly not good enough for achieving some form of
traffic engineering or policing where something more that spreading the
load evenly across isps is required.
In any case, i think it would make sense to explain this in the doc
o Communication continues without any change for the ULP packets.
In addition, there might be some messages exchanged between the
shim sub-layers for (un)reachability detection.
o At some point in time something fails. Depending on the approach
to reachability detection, there might be some advise from the
ULP, or the shim (un)reachability detection might discover that
there is a problem.
At this point in time one or both ends of the communication need
to explore the different alternate locator pairs until a working
pair is found, and rehome to using that pair.
Some additional thinking may be needed here wrt. what
goes on in the alternative paths during the first step and
which end does what in the second step.The reason that
I worry about this is the various middleboxes that we may
have.
In an IPv6-only world we don't need to worry about NATs.
However, there may be stateful firewalls that prevent, for
instance, the peer from contacting our other locator since
the firewall may not have seen any traffic to the peer from
our other locator yet.
o The shim (un)reachability detection will monitor the new locator
pair as it monitored the original locator pair, so that
subsequent
failures can be detected.
There's no consideration here for switch-due-to-policy, such as
someone preferring his LAN connection when its present
over wireless connections, regardless of whether the wireless
works or not. Personally, I'm fine with avoiding
policy (it'll never get configured anyway) but perhaps this
is a limitation that should be explicitly stated & discussed.
i think it may be good to support this configuration...
I mean, this would be also the case of primary/backup configuration
where the backup link has worse performance than the primary, so the
primary is preferred as soon as it is available
This is also related to the load sharing support.
For commonly
used IP protocols this is done by using a different value in the
Flow
Label field, that is, there is no additional header added to the
packets. But for other IP protocol types there is an extra 8 byte
header inserted, which carries the next header value.
This seems a bit surprising, but I'm probably missing
something. (Postscript: after reading the rest of the
document I now understand what's going on. But there
may be other readers who are left wondering at this
point in the document.)
In addition, the non-shim6 messages, which we call payload packets,
will not contain the ULIDs after a failure. This introduces the
requirement that the <peer locator, local locator, local context
tag>
MUST uniquely identify the context. Since the peer's set of
locators
might be dynamic the simplest form of unique allocation of the local
context tag is to pick a number that is unique on the host. Hosts
which serve multiple ULIDs using disjoint sets of locators can
maintain the context tag allocation per such disjoint set.
Not sure if this always needs to be true.
It might be that the shim6 failover protocol signaling
is used to tell the peer what the new locators are. If
that's the case, then the local context tag alone does
not need to be unique, you could rely also on the
addresses.
Also, there seems to be security issue in using just
the context tag to do the demux.
as i understand it, once that the context is identified using the
context tag, the addresses included in the packet are verified against
the locator set available for that context. If the addresses included
in the packet are not included in the locator set of the context, the
packet will be ignored
As i see it the unique context tag is needed not because we have
unknown addresses than can be used as locators, but because those
addresses may be inlcuded in more than one context.
So if both contextA and contextB have Address1 and Address2 as valid
locators (for each endpoint) then when a packet using Address! and
Adress2 is received, the hosts cannot tell which context (A or B) to
use to perform the demux
Both Address1 and Address2 must have been validated and are included
explicitly as locators in the locators set of each context.
(Or is there some
crypto hash somewhere too?) If I learn or guess
your tag, does that mean that I can start sending
traffic that appears to come from you, even if I
use a different source IP and my host is under
ingress filtering restrictions?
i don't think so, see above
....
The peers' lists of locators are normally exchanged as part of the
context establishment exchange. But the set of locators might be
dynamic. For this reason there is a Locator List Update message and
acknowledgement.
This appears to require (optional?) CGA.
right
Perhaps this could be stated
earlier on when you talked about HBA.
The above probe and keepalive messages assume we have an established
host-pair context. However, communication might fail during the
initial context (that is, when the application or transport protocol
is trying to setup some communication). If we want the shim to be
able to optimize discovering a working locator pair in that case, we
need a mechanism to test the reachability of locators independent of
some context. We define a locator pair test message and
acknowledgement for this purpose, even though it isn't yet clear
whether we need such a thing.
It isn't clear to me how this would be done. Presumably
its the application that is going through the IPs retrieved
from DNS, not the IP layer.
right
Is this something that we
need to handle?
i guess that is under discussion if it is worthy, see above...
....
It might be possible to combine these functions. But lets
do the individual design first for each, and combine later
if possible.
Also, the assistance from payload packets in the explore
phase is not discussed.
i am missing what you mean here... could you expand?
regards, marcelo