[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
shim6-proto-07 review
(Note that I'm behind on the shim6 list so I'm not aware of recently
discussed issues, so these may be duplicated here.)
This is a review of the shim proto 07 draft. It contains both nits
and more fundamental issues, they are presented in the order of the
text, but let me get two other things out of the way first:
1. Always including the context tag
As I've said many times, it's a very bad idea to unnecessarily
increase overhead. Not only is this a bad thing in and of itself, but
it also doesn't help convince people to forego IPv4+BGP in favor of
IPv6+shim.
Now if we only include the shim header containing the context tag
after a failure, this could be a reasonable tradeoff. However, I'm
quite sure that if and when shim6 is adopted, some people will use it
to manage their traffic engineering by doing a locator change
immediately and proceed in a shimmed state for the rest of the
communication session.
Earlier, I proposed to include a mandatory way for a host to instruct
its peer to not include the shim header, pending mechanisms to do
demultiplexing without the context tag. After reading the draft, I
realize that this imposes some extra ICMP demultiplexing difficulties
on implementations. So if the wg doesn't want to include this
mandatory option, I suggest something different: since the traffic
engineering mechanisms aren't really developed anyway and this
problem is related to that, simply remove the locator preferences
mechanism so that the context tag suppression can be included when
proper traffic engineering is added.
Also see draft-van-beijnum-shim6-suppress-header-00.txt (do it while
you can, it expires next week...), this outlines a more fully formed
mechanism to suppress the context tag header.
I would very much like to see a consensus call on this subject.
2. Congestion
There is no discussion of congestion issues when the shim moves
ongoing communicaiton to another locator pair, which will generally
make the communication flow over a different path. We've had some
discussions about this before, where the suggestion was made to go
into slow start after a rehoming event. The counter argument: but
maybe the new path is just as fast as the old one. My counter counter
argument: suppose a file transfer over a 1 Gbps link, the Gbps link
goes down and the session is rehomed to a low speed link (GPRS,
modem, ADSL with limited uplink capacity). The send window used when
the session went along over the Gbps link will be so large that
massive congestion ensues, and also, all buffers will be filled up
which guarantuees that the congestion will persist for a relatively
long time, possibly a handful of seconds.
There are no easy answers here, but congestion control is one of the
core concerns in the development of the internet, so I don't think we
can get away with ignoring this completely.
o Preserve established communications in the presence of certain
classes of failures, for example, TCP connections and UDP
streams.
Shouldn't this be "communication"?
o Have minimal impact on upper layer protocols in general and on
transport protocols in particular.
And applications.
Early in the text, the phrase "site multihoming" is used. There has
been some discussion on this list as to whether shim6 actually is
site multihoming, and readers of the draft may not know that all of
this is the result of wgs chartered to work on "site" multihoming. So
I suggest adding text to clear up any potential confusion, for example:
"The shim protocol is a site multihoming solution in the sense that
it allows existing communication to continue when a site that has
multiple connections to the internet experiences an outage on a
subset of these connections or further upstream. However, shim
processing is performed in individual hosts rather than through site-
wide mechanisms."
Finally, this proposal also does not try to provide a new network
level or transport level identifier name space distinct from the
current IP address name space.
The terms "identifier" and "locator" are used extensively even though
the shim is NOT an actual identifier/locator separation solution...
Suggested text (immediately following the sentence above):
"The shim proposal doesn't fully separate the identifier and locator
functions that have traditionally been overloaded in the IP address.
However, throughout this document the term "identifier", or more
specifically, Upper Layer Identifier (ULID) refers to the identifying
function of an IPv6 address, and "locator" to the network layer
routing and forwarding properties of an IPv6 address."
solution. While this document doesn't specify all aspects of this,
it is believed that the approach can be extended to handle the non-
routable address case..
Extra period. (Note that the quaint custom of inserting an extra
space after a sentence is generally "discouraged" in style manuals.)
the original locators become invalid at the same time and depending
on the time that is required to update the DNS and for those updates
to propagate.
Why is the DNS relevant here?
But IP addresses are also used as ULID,
Addresses is plural, ULID singular... Probably make this "ULIDs".
In the worst case we could end up with two separate hosts using the
same ULID while both of them are communicating with the same host.
This potential source for confusion is avoided requiring that any
communication using a ULID MUST be terminated when the ULID becomes
invalid (due to the underlying prefix becoming invalid).
This makes me uncomfortable. How do you know that an address has
become terminally invalid, rather than accidentally unusable? I
contend that the distinction can't be made in a stack in a meaningful
way, so the above requirement will in practice only serve to disrupt
communication unnecessary. Rather, I would require some
administrative "cooling off" period to avoid using the same ULID by a
different host (only possible with CGA not HBA anyway). For instance,
there must be 24 hours between decommisioning and recommisioning of
address space, and we garbage collect shim state after 24 hours of
not being used.
I don't see how regular nomadic behavior will result in two hosts
using the same address in quick succession, and they can further
reduce the potential for problems by not using temporary addresses as
ULIDs.
layer map to/from different locators. The shim6 layer maintains
state, called ULID-pair context, per ULID pairs
"Pairs" should probably be singular.
fields, and even though those locators may be changed by the
transmitting shim6 layer. .
Extra .
The result of this consistent mapping is that there is no impact on
the ULPs. In particular, there is no impact on pseudo-header
checksums and connection identification.
The problem here is that some intermediate system, such as a firewall
or a smart NIC, may take it upon itself to check the TCP or UDP
checksum and discard the packet if the checksum fails. For firewalls
and the like, the best thing is probably either to fully monitor the
shim state so they can do this properly, or forego such checking if a
shim header is present.
For NICs a better solution would be to do an incremental checksum
verification and only over the ULP segment, so that the host stack
must complete the calculation by applying the increment from the
pseudo header, which can largely be cached, so the performance
advantages are almost completely preserved
Inherent in a scalable multihoming mechanism that separates locators
from identifiers is that each host ends up with multiple locators.
This says explicitly that we do id/loc...
This means that at least for initial contact, it is the remote peer
that needs to select which peer locator to try first. In the
case of
shim6 this is performed by applying RFC 3484 address selection.
This is incorrect: the application (or layer working on its behalf)
needs to select an initial ULID, which automatically becomes the
initial locator.
This document uses the terms MUST, SHOULD, RECOMMENDED, MAY, SHOULD
NOT and MUST NOT defined in RFC 2119 [1]. The terms defined in RFC
2460 [2] are also used.
Please list them.
FQDN Fully Qualified Domain Name
Hm, if you don't know what FQDN is you probably also don't know what
it is when spelled out... How about adding "full DNS name"?
document), such as having the ISPs relax there ingress filters, or
selecting the egress such that it matches the IP source address
prefix.
There -> their
o Some heuristic on A or B (or both) determine that it is
appropriate to pay the shim6 overhead to make this host-to-host
communication robust against locator failures. For instance,
this
heuristic might be that more than 50 packets have been sent or
received, or a timer expiration while active packet exchange
is in
place. This makes the shim initiate the 4-way context
establishment exchange.
Maybe say something like:
"The purpose of this heuristic is to avoid setting up a shim context
when only a small number of packets is exchanged between two hosts."
If the context establishment exchange fails, the initiator will
then know that the other end does not support shim6, and will
continue with standard unicast behavior for the session.
Unicast? Shouldn't this be "single homed"?
the message allocated. Thus at a minimum the combination of <peer
ULID, local ULID, local context tag> have to uniquely identify one
context.
I'm not sure if I understand this.
More in general, the draft seems to suggest that the content of the
source address field in received packets may be ignored, but also
that this is not the case. This is a very important decision with far
reaching consequences so it should be made carefully. For instance,
if the source address may be rewritten arbitrarily, obviously routers
can easily do this without much or any coordination. But the
potential for security issues is significant in this case.
context. But since the Payload extension headers are demultiplexed
without looking at the locators in the packet, the receiver will
need
to allocate context tags that are unique for all its contexts.
See above.
context tag is a 47-bit number (the largest which can fit in an
8-octet extension header).
"while preserving one bit to differentiate the shim signalling
messages from the shim header included in data packets, allowing both
to use the same protocol number."
4.2 context forking
Never been a fan of this, but it doesn't seem to add too much extra
complexity the way it is now.
Such discovery probably requires to be along the path in order to
be sniff the context tag value.
Grammar: clause without subject. Who is required to be along the path?
dynamic. For this reason there is a Update Request and Update
Acknowledgement messages, and a Locator List option.
Grammar. "is a" -> "are" would be better.
Even when the list of locators is fixed, a host might determine that
some preferences might have changed. For instance, it might
determine that there is a locally visible failure that implies that
some locator(s) are no longer usable. This uses a Locator
Preferences option in the Update Request message.
I don't consider reachability status a preference...
Bidirectional Communication (FBD). FBD uses a Keepalive message
which is sent when a host has received packets from its peer but has
not yet sent any packets from its ULP to the peer.
No, this works per address (per locator even, not per ULID, IIRC),
not per ULP.
which precedes a routing header). When tunneling is used, whether
IP-in-IP tunneling or the special form of tunneling that Mobile IPv6
uses (with Home Address Options and Routing header type 2), there is
a choice whether the shim applies inside the tunnel or outside the
tunnel, which affects the location of the shim6 header.
How is this coordinated with the other side? If one side does
tunneling first and shim second and the other side the other way
around, there will be trouble. I don't see an easy way to avoid this.
the control messages; only the payload extension header use the Next
Header field.
uses
Next Header: 8-bit selector. Normally set to NO_NXT_HDR (59).
So what happens when some other header follows the shim header? Could
this be used for attacks?
About the different messages: they are very similar. If I were to
implement all of this, I would rather work with one basic structure
for all of the messages, even if the _meaning_ of some fields is
different as long as their structure is always the same. I think this
can easily be done here, by including fields that nearly all messages
need (simply leave it zero when a particular message doesn't need a
field) and use options for things that a particular message needs
that aren't accommodated in the unified structure.
Did I miss the place where HBA information is exchanged?
update request: why is this a request?
This message is sent in response to a Update Request message. It
implies that the Update Request has been received, and that any new
locators in the Update Request can now be used as the source
locators
of packets. But it does not imply that the (new) locators have been
verified to be used as a destination, since the host might defer the
verification of a locator until it sees a need to use a locator as
the destination.
Hm, is it smart to defer verification here? We've already said that
the other end may use them as source addresses. If there is a failure
and we do the verification then, we may find out that it fails and we
have no reasonable course of action.
Also, for CGA verification, don't we need to send the other side a
challenge to avoid replays?
direction. When the ULP is sending bidirectional traffic, no extra
packets need to be inserted.
This works per address pair, not per ULP.
5.13. Probe Message Format
This message and its semantics are defined in [9].
The idea behind that mechanism is to be able to handle the case when
one locator pair works in from A to B, and another locator pair
works
from B to A, but there is no locator pair which works in both
directions. The protocol mechanism is that as A is sending probe
messages to B, B will observe which locator pairs it has received
from and report that back in probe messages it is sending to A.
No, this is to test whether locator pairs work or not in the general
case.
All of the TLV parameters have a length (including Type and Length
fields) which is a multiple of 8 bytes.
Ugh, this is certainly enough to make a grown man cry... Why all of
this alignment silliness? BGP works pretty well without it.
Consequently, the Length field indicates the length of the Contents
field (in bytes). The total length of the TLV parameter (including
Type, Length, Contents, and Padding) is related to the Length field
according to the following formula:
Total Length = 11 + Length - (Length + 3) % 8;
This is almost impossible to understand.
First of all, this assumes familiarity with C or a similar language
from the reader to note that % is the modulo operation and that it
binds stronger than subtraction. As such, this would be an improvement:
Total Length = 11 + Length - ((Length + 3) mod 8)
However, the logic that underpins this is never spelled out, apart
from the requirement that all options be a multiple of 8 bytes long.
(Yes, _bytes_, not octets.)
Text:
"The Total Length of the option is the smallest multiple of 8 bytes
that allows for the 4 bytes of option header and the option itself.
The amount of padding required can be calculated as follows:
padding = 7 - ((Length + 3) mod 8)
And:
Total Length = 4 + Length + padding"
I see no discussion of size issues. A single option can be made large
enough to push a packet beyond 1280 bytes. More realistically, this
will happen when multiple options are present. What happens in this
case? What is the largest option size and the largest shim packet
size implementations must be prepared to handle?
C: Critical. One if this parameter is critical, and
MUST
be recognized by the recipient, zero otherwise.
You can't force a receiver to recognize something...
o If C=1 then the host SHOULD send back an ICMP parameter problem
(type 4, code 1), with the Pointer referencing the first octet in
the option Type field. When C=1 the message MUST NOT be
processed.
Why use ICMP for errors? Isn't it easier to define a shim error
message? If the correspondent wants to fall back to some other way to
set up the shim having to intercept ICMP messages to make that happen
is pretty messy.
More in general, most error conditions are handled by silently
dropping packets, however, which is a very bad idea because that way,
there is no difference between an error and lost messages. So in some
cases, a host may continue to resend the offending packet because it
doesn't know something went wrong. The main problem with this
approach is that you can't debug it from one end: you need to see
what happens on both ends to determine why something doesn't work.
Silently dropping packets because of errors is the right approach for
security reasons in some cases, but I don't think this applies here.
A short error message with an error code and optionally a human-
readable message would be much better. As long as these error packets
are smaller than the packets that trigger them, there should be
little or no security impact, especially considering that we're
prepared to talk shim with the correspondent in question to begin with.
The responder can choose exactly what input is used to compute the
validator, and what one-way function (MD5, SHA1)
Or something else, I presume? So "(such as MD5 or SHA-1)"
About the locator option: how many locators are allowed?
TEMPORARY: 0x02
The intent of the BROKEN flag is to inform the peer that a given
locator is known to be not working. The intent of TEMPORARY is to
allow the distinction between more stable addresses and less stable
addresses when shim6 is combined with IP mobility, when we might
have
more stable home locators, and less stable care-of-locators.
So this has nothing to do with RFC 3041 temporary addresses? In that
case, a different name is probably better.
o For each peer locator, a bit whether it has been verified using
HBA or CGA, and a bit whether the locator has been probed to
verify that the ULID is present at that location.
"Flag" rather than "bit"?
| E-FAILED | Context establishment exchange failed
How do we know this, and is it necessary to explicitly take notice of
this situation?
| E-FAILED | ULID(peer), ULID
(local) |
|
| |
| NO-SUPPORT | ULID(peer), ULID(local)
How is ULID(local) relevant here? We know there is connectivity (ULP
is running) so if we don't get any shim negotation back or it fails,
then this situation can be attributed to the peer as a whole, not to
the ULID pair.
In all the cases the result is that the peer without state
receives a
shim message for which it has to context for the context tag.
To -> no?
case we can not use the recovery mechanisms since there needs to be
separate context tags for the two ULID pairs.
Needs -> need
Regarding section 7.9: shouldn't there be checks to make sure that
seemingly duplicate packets contain the same information as the
earlier packets they are supposedly the duplicate of?
What if validators don't match? Eventually this shouldn't be a
problem but I expect some initial trouble here because you're doing
hashes over a fairly large number of values, a small mistake
somewhere means the hash doesn't work, some feedback in the form of
an error message would be good.
It occurs to me that there is nothing or very little in the protocol
that precludes shim negotiation using non-ULID addresses. We probably
need a few minor tweaks to the reachability protocol to also allow
this, but then there is no fundamental reason to not allow shim setup
using non-ULID addresses, and by extension, unreachable ULIDs = a
separate identifier space. If it's this easy, we should definately
make sure there isn't some minor obstacle somewhere, so that we can
add this feature easily in the future when we've worked out the
additional issues such as locator discovery.
o Where Ls(peer) has at least one locator in common with the newly
created or updated context.
Why? I don't see how that buys us anything. Also, it's fairly trivial
to insert a bogus locator to meet the requirement that there is one
in common between the old and new sets.
Adn why verify whether the source address is in Ls(peer)? The
security mechanisms do all the checking we need.
context. In this case, we are in the Context confusion
situation,
and the host MUST NOT use the old context to send any
packets. It
MAY just discard the old context (after all, the peer has
discarded it), or it MAY attempt to re-establish the old context
by sending a new I1 message and moving its state to I1-SENT. In
any case, once that this situation is detected, the host MUST NOT
keep two contexts with overlapping Ls(peer) locator sets and the
same context tag in ESTABLISHED state, since this would result in
demultiplexing problems on the peer.
What if an attacker is trying to interfere with legitimate
communication? We must be VERY sure that the new shim messages come
from the same host as the one that created the existing state if
we're going to mess with that existing state.
About the randomness of the context tag: I don't think we have to
require that the entire context tag random in a cryptographically
strong sense. If this makes implementation easier, why not allow an
implementation to use part of the CT to be used as a lookup key
(which is relatively easy to predict) as long as enough bits are
really random? In my opinion, 20 good random bits is enough here.
Suggested text (but no suggested place to put it):
"It is important that context tags are hard to guess for off-path
attackers. Therefore, if an implementation uses structure in the
context tag to facilitate efficient lookups, at least 20 bits of the
context tag must be unstructured and populated by completely random
bits. For this purpose, bits derived from one of the generally used
one-way hash functions such as SHA-1 may be considered random.
A host MUST silently discard any received Update Acknowledgement
messages that do not satisfy all of the following validity checks in
addition to those specified in Section 12.2:
o The Hdr Ext Len field is at least 1, i.e., the length is at least
16 octets.
Added bonus when the header structure is unified: no need to repeat
the above over and over throughout the text.
NO_R1_HOLDDOWN_TIME = 1 min
ICMP_HOLDDOWN_TIME = 10 min
This seems rather short, basically a shim host talking to a non-shim
host would retry setting up the shim every minute or every 10 minutes
even though there is good reason to assume this won't be successful.
Something like several hours seems more appropriate. (And only when
packets are actively exchanged.)
network transit path. Second, in case that IPSec is implemented as
Bump-In-The-Wire (BITW) [7] it is expected that the shim6 sub-layer
is also implemnted in the same fashion.
Not strong enough:
"in case that IPSec is implemented as Bump-In-The-Wire (BITW) [7],
either the shim MUST be disabled, or the shim MUST also be
implemented as Bump-In-The-Wire, in order to satisfy the requirement
that IPsec is layered above the shim."
could require a 2-way handshake "did you really loose the state?"
in response to the error message.
lose
o The validator included in the R1 and R1bis packets are generated
as a hash of several input parameters. However, most of the
inputs are actually determined by the sender, and only the secret
value S is unknown to the sender. However, the resulting
protection is deemed to be enough since it would be easier for
the
attacker to just obtain a new validator sending a I1 packet than
performing all the computations required to determine the secret
S. However, it is recommended that the host changes the secret S
periodically.
Too many howevers...
o Study whether a host explicitly fail communication when a ULID
becomes invalid (based on RFC 2462 lifetimes or DHCPv6), or
should
we let the communication continue using the invalidated ULID (it
can certainly work since other locators will be used).
Some kind of grammar problem, not obvious to me what is meant here.
Appendix B. Simplified State Machine
The states are defined in Section 6.2. The intent is that the
stylized description below be consistent with the textual
description
in the specification, but should they conflict, the textual
description is normative.
Haven't looked at this.
that the Flow Label carries context information as proposed in the
now expired NOID draft. .
Extra .
It may happen, that later on, one of the hosts, e.g. Host A looses
the shim context.
loses
Mechanisms for detecting context. loss
Extra word?
There are discussions in the appendixes, maybe make this a separate
document?
The Locator List Option Format only specifies two verification
methods at this time: CGA or HBA. What about the case where a locator
can be verified using either CGA or HBA? Maybe it makes more sense to
have each method be a bit so they can be present or absent
independently.
approach eliminates the possibility of a context confusion situation
because premature garbage collection, but it does not prevents
prevent
[9] Arkko, J. and I. Beijnum, "Failure Detection and Locator Pair
Exploration Protocol for IPv6 Multihoming",
Please make this "I. van Beijnum"
Note to self: look at implications of the fact that keepalive and
probe messages (as defined here) don't trigger R1bis in the
reachability draft.