[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

shim6 and bit errors in data packet headers




One issue we haven't discussed is what happens when there is a bit error in the shim information, and whether a checksum is needed on the shim information.

For the IPv6 header there was a case analysis for all the header fields
to determine what would happen if there was a bit error in the field.
The result (as I recall it) is that not much bad happens.
Bit errors can cause the packet to fail the ULP checksum and be silently
dropped as a result, cause the routers to send an ICMP error (because the
IPv6 destination with the error in it is unreachable) etc.

We need to do the same type of analysis for the shim6 protocol, but we can
also adjust the design of the protocol to make sure bit errors don't have
a detrimental effect.

For the part of shim6 which performs the state exchange, locator changes, and
testing the paths, presumably it is easy to just add the regular 16 bit
checksum.


But for the data packets adding one more 16 bit checksum would be more costly,
since if we need it we can't use the flow label, and a destination option
header would either be 16 bytes (an 8 byte destination option can only hold
32 bits of payload, and a 16 bit checksum would imply only a 16 bit context
tag, which is probably too small.)


Let's assume the receiver identifies the shim6 state based on the
<source locator, destination locator, context tag>. This is done before the
packet is passed to the ULP, that is, the ICMP/UDP/TCP/SCTP checksums
are not of any use to detect a potentially corrupt packet.

As an example, assume that A and B has established shim6 state for
communication between ULID A1 and B1, using the locator sets
<{A1, A2}, {B1, B2}>, and that the communication using <A1, B1> has failed
so that things switches to the locator pair <A1, B2>.

B will now receive packets with the shim identifying information
being <A1, B2, CT>, which will identify the shim6 state and the state
indicates that the ULIDs should be set to <A1, B1> before passing the packet
to the ULP.

What happens if there is a bit error in the locators in the packet or the
context tag field? (Doesn't matter whether the context tag is carried in
the flow label field or somewhere else for this analysis).

If there is a bit error in B2 (turning it into B2') the packets wouldn't be
delivered to the host B, but they might be delivered to a different host, which
would presumably(??), not have any context state for <A1, B2', CT>.
If it does have such shim6 state, it would apply the ULID replacement from
that context state and pass the packet to the ULP. The ULP checksum would most
likely fail in this case.
This doesn't seem any different than the case in base IPv6 and a bit error
in the destination address field; we rely on the ULP checksum to discard such
packets.


If there is a bit error in A1 or CT, the packet would be delivered to B,
but most likely B would not have any context state which matches <A1', B2, CT>
or <A1, B2, CT'>. If there is matching context state, the argument above
about the ULP checksum applies.
When there is no matching context state at B, the question is what B
does in such a case.


This is a function of how we do state management in shim6. There has been
little discussion about this. We could do soft state, where there is no
explicit taredown/close message, or we could do something which tries to
tare down the state at the both ends in a coordinated fashion.

If we assume that we do soft state management, then a host like B needs to
be prepared to receive shim6 packets where it has no context. This is
because B might have discarded the shim6 state before A did, or B might
have crashes and lost all state while A retained the shim6 state.

This implies that in the bit error case above, since B can't tell the
difference between a bit error and the case when it has lost/discarded
the state, B needs to at least send an error message to A saying "I have
no matching shim6 context".
But B probably shouldn't do any more than this, since it doesn't know whether
the cause was a bit error or it having lost/discarded the shim6 state.


[Note that B can't just silently pass up such packets to the ULP, because
if B doesn't do the locator->ULID replacement, the packets will (most likely)
be dropped by the ULP to to a checksum error. Thus there would *not* be e.g.,
a TCP reset which would tell the peer ULP to restart.]


So in conclusion, if we don't have a checksum for the shim6 context tag, then
we are constrained on what the shim can do when receiving a packet for
which it has no matching context.


If we add a checksum (covering at least the source locator, destination
locator, and context tag), then the receiving shim6 layer can silently discard
packets with bit errors, and we have the flexibility to do something
different in order to recover from lost/discarded state at the
receiver.


Comments?
    Erik