[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

shim6-proto-07 review

To: shim6-wg <shim6@psg.com>
Subject: shim6-proto-07 review
From: Iljitsch van Beijnum <iljitsch@muada.com>
Date: Mon, 11 Dec 2006 19:54:04 +0100

(Note that I'm behind on the shim6 list so I'm not aware of recentlydiscussed issues, so these may be duplicated here.)

This is a review of the shim proto 07 draft. It contains both nitsand more fundamental issues, they are presented in the order of thetext, but let me get two other things out of the way first:

1. Always including the context tag

As I've said many times, it's a very bad idea to unnecessarilyincrease overhead. Not only is this a bad thing in and of itself, butit also doesn't help convince people to forego IPv4+BGP in favor ofIPv6+shim.

Now if we only include the shim header containing the context tagafter a failure, this could be a reasonable tradeoff. However, I'mquite sure that if and when shim6 is adopted, some people will use itto manage their traffic engineering by doing a locator changeimmediately and proceed in a shimmed state for the rest of thecommunication session.

Earlier, I proposed to include a mandatory way for a host to instructits peer to not include the shim header, pending mechanisms to dodemultiplexing without the context tag. After reading the draft, Irealize that this imposes some extra ICMP demultiplexing difficultieson implementations. So if the wg doesn't want to include thismandatory option, I suggest something different: since the trafficengineering mechanisms aren't really developed anyway and thisproblem is related to that, simply remove the locator preferencesmechanism so that the context tag suppression can be included whenproper traffic engineering is added.

Also see draft-van-beijnum-shim6-suppress-header-00.txt (do it whileyou can, it expires next week...), this outlines a more fully formedmechanism to suppress the context tag header.

I would very much like to see a consensus call on this subject.


2. Congestion

There is no discussion of congestion issues when the shim movesongoing communicaiton to another locator pair, which will generallymake the communication flow over a different path. We've had somediscussions about this before, where the suggestion was made to gointo slow start after a rehoming event. The counter argument: butmaybe the new path is just as fast as the old one. My counter counterargument: suppose a file transfer over a 1 Gbps link, the Gbps linkgoes down and the session is rehomed to a low speed link (GPRS,modem, ADSL with limited uplink capacity). The send window used whenthe session went along over the Gbps link will be so large thatmassive congestion ensues, and also, all buffers will be filled upwhich guarantuees that the congestion will persist for a relativelylong time, possibly a handful of seconds.

There are no easy answers here, but congestion control is one of thecore concerns in the development of the internet, so I don't think wecan get away with ignoring this completely.


   o  Preserve established communications in the presence of certain

classes of failures, for example, TCP connections and UDPstreams.

Shouldn't this be "communication"?


   o  Have minimal impact on upper layer protocols in general and on
      transport protocols in particular.

And applications.

Early in the text, the phrase "site multihoming" is used. There hasbeen some discussion on this list as to whether shim6 actually issite multihoming, and readers of the draft may not know that all ofthis is the result of wgs chartered to work on "site" multihoming. SoI suggest adding text to clear up any potential confusion, for example:

"The shim protocol is a site multihoming solution in the sense thatit allows existing communication to continue when a site that hasmultiple connections to the internet experiences an outage on asubset of these connections or further upstream. However, shimprocessing is performed in individual hosts rather than through site-wide mechanisms."


   Finally, this proposal also does not try to provide a new network
   level or transport level identifier name space distinct from the
   current IP address name space.

The terms "identifier" and "locator" are used extensively even thoughthe shim is NOT an actual identifier/locator separation solution...Suggested text (immediately following the sentence above):

"The shim proposal doesn't fully separate the identifier and locatorfunctions that have traditionally been overloaded in the IP address.However, throughout this document the term "identifier", or morespecifically, Upper Layer Identifier (ULID) refers to the identifyingfunction of an IPv6 address, and "locator" to the network layerrouting and forwarding properties of an IPv6 address."


   solution.  While this document doesn't specify all aspects of this,
   it is believed that the approach can be extended to handle the non-
   routable address case..

Extra period. (Note that the quaint custom of inserting an extraspace after a sentence is generally "discouraged" in style manuals.)


   the original locators become invalid at the same time and depending
   on the time that is required to update the DNS and for those updates
   to propagate.

Why is the DNS relevant here?


   But IP addresses are also used as ULID,

Addresses is plural, ULID singular... Probably make this "ULIDs".


   In the worst case we could end up with two separate hosts using the
   same ULID while both of them are communicating with the same host.

   This potential source for confusion is avoided requiring that any
   communication using a ULID MUST be terminated when the ULID becomes
   invalid (due to the underlying prefix becoming invalid).

This makes me uncomfortable. How do you know that an address hasbecome terminally invalid, rather than accidentally unusable? Icontend that the distinction can't be made in a stack in a meaningfulway, so the above requirement will in practice only serve to disruptcommunication unnecessary. Rather, I would require someadministrative "cooling off" period to avoid using the same ULID by adifferent host (only possible with CGA not HBA anyway). For instance,there must be 24 hours between decommisioning and recommisioning ofaddress space, and we garbage collect shim state after 24 hours ofnot being used.

I don't see how regular nomadic behavior will result in two hostsusing the same address in quick succession, and they can furtherreduce the potential for problems by not using temporary addresses asULIDs.


   layer map to/from different locators.  The shim6 layer maintains
   state, called ULID-pair context, per ULID pairs

"Pairs" should probably be singular.


   fields, and even though those locators may be changed by the
   transmitting shim6 layer. .

Extra  .

   The result of this consistent mapping is that there is no impact on
   the ULPs.  In particular, there is no impact on pseudo-header
   checksums and connection identification.

The problem here is that some intermediate system, such as a firewallor a smart NIC, may take it upon itself to check the TCP or UDPchecksum and discard the packet if the checksum fails. For firewallsand the like, the best thing is probably either to fully monitor theshim state so they can do this properly, or forego such checking if ashim header is present.

For NICs a better solution would be to do an incremental checksumverification and only over the ULP segment, so that the host stackmust complete the calculation by applying the increment from thepseudo header, which can largely be cached, so the performanceadvantages are almost completely preserved


   Inherent in a scalable multihoming mechanism that separates locators
   from identifiers is that each host ends up with multiple locators.

This says explicitly that we do id/loc...


   This means that at least for initial contact, it is the remote peer

that needs to select which peer locator to try first. In thecase of

   shim6 this is performed by applying RFC 3484 address selection.

This is incorrect: the application (or layer working on its behalf)needs to select an initial ULID, which automatically becomes theinitial locator.


   This document uses the terms MUST, SHOULD, RECOMMENDED, MAY, SHOULD
   NOT and MUST NOT defined in RFC 2119 [1].  The terms defined in RFC
   2460 [2] are also used.

Please list them.


   FQDN                Fully Qualified Domain Name

Hm, if you don't know what FQDN is you probably also don't know whatit is when spelled out... How about adding "full DNS name"?


   document), such as having the ISPs relax there ingress filters, or
   selecting the egress such that it matches the IP source address
   prefix.

There -> their


   o  Some heuristic on A or B (or both) determine that it is
      appropriate to pay the shim6 overhead to make this host-to-host

communication robust against locator failures. For instance,this

      heuristic might be that more than 50 packets have been sent or

received, or a timer expiration while active packet exchangeis in

      place.  This makes the shim initiate the 4-way context
      establishment exchange.

Maybe say something like:

"The purpose of this heuristic is to avoid setting up a shim contextwhen only a small number of packets is exchanged between two hosts."


      If the context establishment exchange fails, the initiator will
      then know that the other end does not support shim6, and will
      continue with standard unicast behavior for the session.

Unicast? Shouldn't this be "single homed"?


   the message allocated.  Thus at a minimum the combination of <peer
   ULID, local ULID, local context tag> have to uniquely identify one
   context.

I'm not sure if I understand this.

More in general, the draft seems to suggest that the content of thesource address field in received packets may be ignored, but alsothat this is not the case. This is a very important decision with farreaching consequences so it should be made carefully. For instance,if the source address may be rewritten arbitrarily, obviously routerscan easily do this without much or any coordination. But thepotential for security issues is significant in this case.


   context.  But since the Payload extension headers are demultiplexed

without looking at the locators in the packet, the receiver willneed

   to allocate context tags that are unique for all its contexts.

See above.


   context tag is a 47-bit number (the largest which can fit in an
   8-octet extension header).

"while preserving one bit to differentiate the shim signallingmessages from the shim header included in data packets, allowing bothto use the same protocol number."


4.2 context forking

Never been a fan of this, but it doesn't seem to add too much extracomplexity the way it is now.


      Such discovery probably requires to be along the path in order to
      be sniff the context tag value.

Grammar: clause without subject. Who is required to be along the path?


   dynamic.  For this reason there is a Update Request and Update
   Acknowledgement messages, and a Locator List option.

Grammar. "is a" -> "are" would be better.


   Even when the list of locators is fixed, a host might determine that
   some preferences might have changed.  For instance, it might
   determine that there is a locally visible failure that implies that
   some locator(s) are no longer usable.  This uses a Locator
   Preferences option in the Update Request message.

I don't consider reachability status a preference...


   Bidirectional Communication (FBD).  FBD uses a Keepalive message
   which is sent when a host has received packets from its peer but has
   not yet sent any packets from its ULP to the peer.

No, this works per address (per locator even, not per ULID, IIRC),not per ULP.


   which precedes a routing header).  When tunneling is used, whether
   IP-in-IP tunneling or the special form of tunneling that Mobile IPv6
   uses (with Home Address Options and Routing header type 2), there is
   a choice whether the shim applies inside the tunnel or outside the
   tunnel, which affects the location of the shim6 header.

How is this coordinated with the other side? If one side doestunneling first and shim second and the other side the other wayaround, there will be trouble. I don't see an easy way to avoid this.


   the control messages; only the payload extension header use the Next
   Header field.

uses


   Next Header:   8-bit selector.  Normally set to NO_NXT_HDR (59).

So what happens when some other header follows the shim header? Couldthis be used for attacks?

About the different messages: they are very similar. If I were toimplement all of this, I would rather work with one basic structurefor all of the messages, even if the _meaning_ of some fields isdifferent as long as their structure is always the same. I think thiscan easily be done here, by including fields that nearly all messagesneed (simply leave it zero when a particular message doesn't need afield) and use options for things that a particular message needsthat aren't accommodated in the unified structure.

Did I miss the place where HBA information is exchanged?

update request: why is this a request?


   This message is sent in response to a Update Request message.  It
   implies that the Update Request has been received, and that any new

locators in the Update Request can now be used as the sourcelocators

   of packets.  But it does not imply that the (new) locators have been
   verified to be used as a destination, since the host might defer the
   verification of a locator until it sees a need to use a locator as
   the destination.

Hm, is it smart to defer verification here? We've already said thatthe other end may use them as source addresses. If there is a failureand we do the verification then, we may find out that it fails and wehave no reasonable course of action.

Also, for CGA verification, don't we need to send the other side achallenge to avoid replays?


   direction.  When the ULP is sending bidirectional traffic, no extra
   packets need to be inserted.

This works per address pair, not per ULP.


5.13.  Probe Message Format

   This message and its semantics are defined in [9].

   The idea behind that mechanism is to be able to handle the case when

one locator pair works in from A to B, and another locator pairworks

   from B to A, but there is no locator pair which works in both
   directions.  The protocol mechanism is that as A is sending probe
   messages to B, B will observe which locator pairs it has received
   from and report that back in probe messages it is sending to A.

No, this is to test whether locator pairs work or not in the generalcase.


   All of the TLV parameters have a length (including Type and Length
   fields) which is a multiple of 8 bytes.

Ugh, this is certainly enough to make a grown man cry... Why all ofthis alignment silliness? BGP works pretty well without it.


   Consequently, the Length field indicates the length of the Contents
   field (in bytes).  The total length of the TLV parameter (including
   Type, Length, Contents, and Padding) is related to the Length field
   according to the following formula:

   Total Length = 11 + Length - (Length + 3) % 8;

This is almost impossible to understand.

First of all, this assumes familiarity with C or a similar languagefrom the reader to note that % is the modulo operation and that itbinds stronger than subtraction. As such, this would be an improvement:

Total Length = 11 + Length - ((Length + 3) mod 8)

However, the logic that underpins this is never spelled out, apartfrom the requirement that all options be a multiple of 8 bytes long.(Yes, _bytes_, not octets.)

Text:

"The Total Length of the option is the smallest multiple of 8 bytesthat allows for the 4 bytes of option header and the option itself.The amount of padding required can be calculated as follows:

padding = 7 - ((Length + 3) mod 8)

And:

Total Length = 4 + Length + padding"

I see no discussion of size issues. A single option can be made largeenough to push a packet beyond 1280 bytes. More realistically, thiswill happen when multiple options are present. What happens in thiscase? What is the largest option size and the largest shim packetsize implementations must be prepared to handle?

C: Critical. One if this parameter is critical, andMUST

                  be recognized by the recipient, zero otherwise.

You can't force a receiver to recognize something...


   o  If C=1 then the host SHOULD send back an ICMP parameter problem
      (type 4, code 1), with the Pointer referencing the first octet in
      the option Type field.  When C=1 the message MUST NOT be
      processed.

Why use ICMP for errors? Isn't it easier to define a shim errormessage? If the correspondent wants to fall back to some other way toset up the shim having to intercept ICMP messages to make that happenis pretty messy.

More in general, most error conditions are handled by silentlydropping packets, however, which is a very bad idea because that way,there is no difference between an error and lost messages. So in somecases, a host may continue to resend the offending packet because itdoesn't know something went wrong. The main problem with thisapproach is that you can't debug it from one end: you need to seewhat happens on both ends to determine why something doesn't work.

Silently dropping packets because of errors is the right approach forsecurity reasons in some cases, but I don't think this applies here.A short error message with an error code and optionally a human-readable message would be much better. As long as these error packetsare smaller than the packets that trigger them, there should belittle or no security impact, especially considering that we'reprepared to talk shim with the correspondent in question to begin with.


   The responder can choose exactly what input is used to compute the
   validator, and what one-way function (MD5, SHA1)

Or something else, I presume? So "(such as MD5 or SHA-1)"


About the locator option: how many locators are allowed?


      TEMPORARY: 0x02

   The intent of the BROKEN flag is to inform the peer that a given
   locator is known to be not working.  The intent of TEMPORARY is to
   allow the distinction between more stable addresses and less stable

addresses when shim6 is combined with IP mobility, when we mighthave

   more stable home locators, and less stable care-of-locators.

So this has nothing to do with RFC 3041 temporary addresses? In thatcase, a different name is probably better.


   o  For each peer locator, a bit whether it has been verified using
      HBA or CGA, and a bit whether the locator has been probed to
      verify that the ULID is present at that location.

"Flag" rather than "bit"?


   | E-FAILED            | Context establishment exchange failed

How do we know this, and is it necessary to explicitly take notice ofthis situation?

| E-FAILED | ULID(peer), ULID(local) ||| |

   | NO-SUPPORT          | ULID(peer), ULID(local)

How is ULID(local) relevant here? We know there is connectivity (ULPis running) so if we don't get any shim negotation back or it fails,then this situation can be attributed to the peer as a whole, not tothe ULID pair.

In all the cases the result is that the peer without statereceives a

   shim message for which it has to context for the context tag.

To -> no?


   case we can not use the recovery mechanisms since there needs to be
   separate context tags for the two ULID pairs.

Needs -> need

Regarding section 7.9: shouldn't there be checks to make sure thatseemingly duplicate packets contain the same information as theearlier packets they are supposedly the duplicate of?

What if validators don't match? Eventually this shouldn't be aproblem but I expect some initial trouble here because you're doinghashes over a fairly large number of values, a small mistakesomewhere means the hash doesn't work, some feedback in the form ofan error message would be good.

It occurs to me that there is nothing or very little in the protocolthat precludes shim negotiation using non-ULID addresses. We probablyneed a few minor tweaks to the reachability protocol to also allowthis, but then there is no fundamental reason to not allow shim setupusing non-ULID addresses, and by extension, unreachable ULIDs = aseparate identifier space. If it's this easy, we should definatelymake sure there isn't some minor obstacle somewhere, so that we canadd this feature easily in the future when we've worked out theadditional issues such as locator discovery.



   o  Where Ls(peer) has at least one locator in common with the newly
      created or updated context.

Why? I don't see how that buys us anything. Also, it's fairly trivialto insert a bogus locator to meet the requirement that there is onein common between the old and new sets.

Adn why verify whether the source address is in Ls(peer)? Thesecurity mechanisms do all the checking we need.

context. In this case, we are in the Context confusionsituation,and the host MUST NOT use the old context to send anypackets. It

      MAY just discard the old context (after all, the peer has
      discarded it), or it MAY attempt to re-establish the old context
      by sending a new I1 message and moving its state to I1-SENT.  In
      any case, once that this situation is detected, the host MUST NOT
      keep two contexts with overlapping Ls(peer) locator sets and the
      same context tag in ESTABLISHED state, since this would result in
      demultiplexing problems on the peer.

What if an attacker is trying to interfere with legitimatecommunication? We must be VERY sure that the new shim messages comefrom the same host as the one that created the existing state ifwe're going to mess with that existing state.

About the randomness of the context tag: I don't think we have torequire that the entire context tag random in a cryptographicallystrong sense. If this makes implementation easier, why not allow animplementation to use part of the CT to be used as a lookup key(which is relatively easy to predict) as long as enough bits arereally random? In my opinion, 20 good random bits is enough here.Suggested text (but no suggested place to put it):

"It is important that context tags are hard to guess for off-pathattackers. Therefore, if an implementation uses structure in thecontext tag to facilitate efficient lookups, at least 20 bits of thecontext tag must be unstructured and populated by completely randombits. For this purpose, bits derived from one of the generally usedone-way hash functions such as SHA-1 may be considered random.


   A host MUST silently discard any received Update Acknowledgement
   messages that do not satisfy all of the following validity checks in
   addition to those specified in Section 12.2:

   o  The Hdr Ext Len field is at least 1, i.e., the length is at least
      16 octets.

Added bonus when the header structure is unified: no need to repeatthe above over and over throughout the text.


   NO_R1_HOLDDOWN_TIME = 1 min

   ICMP_HOLDDOWN_TIME = 10 min

This seems rather short, basically a shim host talking to a non-shimhost would retry setting up the shim every minute or every 10 minuteseven though there is good reason to assume this won't be successful.Something like several hours seems more appropriate. (And only whenpackets are actively exchanged.)


   network transit path.  Second, in case that IPSec is implemented as
   Bump-In-The-Wire (BITW) [7] it is expected that the shim6 sub-layer
   is also implemnted in the same fashion.

Not strong enough:

"in case that IPSec is implemented as Bump-In-The-Wire (BITW) [7],either the shim MUST be disabled, or the shim MUST also beimplemented as Bump-In-The-Wire, in order to satisfy the requirementthat IPsec is layered above the shim."


      could require a 2-way handshake "did you really loose the state?"
      in response to the error message.

lose


   o  The validator included in the R1 and R1bis packets are generated
      as a hash of several input parameters.  However, most of the
      inputs are actually determined by the sender, and only the secret
      value S is unknown to the sender.  However, the resulting

protection is deemed to be enough since it would be easier forthe

      attacker to just obtain a new validator sending a I1 packet than
      performing all the computations required to determine the secret
      S. However, it is recommended that the host changes the secret S
      periodically.

Too many howevers...


   o  Study whether a host explicitly fail communication when a ULID

becomes invalid (based on RFC 2462 lifetimes or DHCPv6), orshould

      we let the communication continue using the invalidated ULID (it
      can certainly work since other locators will be used).

Some kind of grammar problem, not obvious to me what is meant here.


Appendix B.  Simplified State Machine

   The states are defined in Section 6.2.  The intent is that the

stylized description below be consistent with the textualdescription

   in the specification, but should they conflict, the textual
   description is normative.

Haven't looked at this.


   that the Flow Label carries context information as proposed in the
   now expired NOID draft. .

Extra  .


   It may happen, that later on, one of the hosts, e.g.  Host A looses
   the shim context.

loses


   Mechanisms for detecting context. loss

Extra word?

There are discussions in the appendixes, maybe make this a separatedocument?

The Locator List Option Format only specifies two verificationmethods at this time: CGA or HBA. What about the case where a locatorcan be verified using either CGA or HBA? Maybe it makes more sense tohave each method be a bit so they can be present or absentindependently.


   approach eliminates the possibility of a context confusion situation
   because premature garbage collection, but it does not prevents

prevent


   [9]  Arkko, J. and I. Beijnum, "Failure Detection and Locator Pair
        Exploration Protocol for IPv6  Multihoming",

Please make this "I. van Beijnum"

Note to self: look at implications of the fact that keepalive andprobe messages (as defined here) don't trigger R1bis in thereachability draft.

Follow-Ups:
- Changes in version 8 resulting from: Re: shim6-proto-07 review
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Re: shim6-proto-07 review
  - From: marcelo bagnulo braun <marcelo@it.uc3m.es>
- Congestion issue [Re: shim6-proto-07 review]
  - From: Brian E Carpenter <brc@zurich.ibm.com>

Prev by Date: Retransmission of I2 messages
Next by Date: teardown of the ULID-pair context
Previous by thread: Retransmission of I2 messages
Next by thread: Congestion issue [Re: shim6-proto-07 review]
Index(es):
- Date
- Thread