Complexity and Evaluation

Notes from some readings on complexity and evaluation.

A Review of Three Recent Books on Complexity and Evaluation

Gerrits and Verweij (2015) reviewed three books that explored complexity and evaluation:

  • Forss et al’s Evaluating the Complex: Attribution, Contribution, and Beyond (2011)
  • Patton’s Developmental Evaluation: Applying Complexity Concepts to Enhance Innovation and Use (2011)
  • Wolf-Branigin’s Using Complexity Theory for Research and Program Evaluation (2013)

They note that all three of these books raise a similar concern (“that the complexity of social reality is often ignored, leading to misguided evaluation and policy recommendations, and that the current methodological toolbox is not particularly well-suited to deal with complexity” (p. 485)), but that they deal with this concern in different ways.

 Forss et al Patton Wolf-Branigan
 How they define complexity

“there is a difference between complexity as an experience and complexity as a precise quality of social processes and structures” (p. 485)

give multiple definitions

mention “a system state somewhere between order and chaos” and a focus on the non-linear and situated nature of complex systems” (p. 485)

  • “describes rather than defines complexity” (p. 485)
  • core principles of:
    • non-linearity
    • emergence
    • adaptive behavior
    • uncertainty
    • dynamics
    • co-evolution
  • “bolts on Holling’s adaptive cycle and panarchy” (p. 485)
  •  “settles on Mitchell’s (2009) definition which focuses on the self-organizing aspect of complex systems, out of which collective behavior emerges” (p. 485)
  • “emergent behavior [..] is a process that is embedded in complex systems” (p. 485)
  • complex systems –> complex adaptive systems “when the constituent elements show mutual adaptation” (p. 485)

They note that Wolf-Branigan offers a “complexity-friendly set of evaluation methods” and that Forss et al, being an edited volume of chapters by different authors with a bunch of different ways that they dealt with complexity (and possibly some conflation of complexity and complicatedness, which suggests they perhaps did not have a clear understanding of complexity).

In contrast to a focus on methods, they noted that Patton views complexity as a “heuristic and sense-making device” (p. 487) and thus Developmental Evaluation is “an approach that […] favors:

Developmental Evaluation “is a dynamic kind of evaluation that does not only seek to identify causal relationships and to serve accountability, but that also offers an approach that interacts with the programs it evaluates, preferably feeding results back into the program on the fly, so as to develop it”  (Gerrits & Verweij, 2015, p. 486)

A few other points of interest:

  • “Whereas complicated interventions can be evaluated by asking “what works for whom in what contexts” […] in complex programs, ‘it is not possible to report on these in terms of “what works”… because what “it” is constantly changes'” (Gerrits & Verweij, 2015, p. 486)
  • When the “object of evaluation is complex (i.e., changes over time, etc.), it challenges the evaluation methods that do not account for that complexity” (Gerrits & Verweij, 2015, p. 488)
  • “Complexity features a language that is relatively foreign to evaluators and that is difficult to operationalize” (Gerrits & Verweij, 2015, p. 488)

A Paper on “Evaluating Complex and Unfolding Interventions in Real Time”

  • simple interventions rely upon a single (a coherent set of) known mechanisms with a single (a coherent set of) output whose benefits are understood to lead to measurable and widely anticipated outcomes” – e.g., a drug to treat a disease
  • “complicated interventions involve a number of interrelated parts, all of which are required to function in a predictable way if the whole interventions is to success. the processes are broadly predictable and outputs arrive at outcomes in well-understood ways” – e.g., a rocketship is complicated – lots of interrelated parts, but it functions as expected (e.g., “it does not transform itself over time into a toaster”)
  • “complex interventions are characterized by:
    • feedback loops
    • adaptation and learning by both those delivering and those receiving the intervention
    • portfolio of activies and desired outcomes which may be re-prioritized or changed
    • sensitive to starting conditions
    • outcomes tend to change, possibly significantly, over time
    • have multiple components which may act independently and interdependently” (Ling, p. 80)
  • when delivering (or receiving) complex interventions, people:
    • “learn and adapt
    • reflexively seek to make sense of the systems in which they act and where possible to change how they work
    • adapt behaviour based on a changing understanding of the consequences of their actions”
    • of course, they (and the evaluators) only have an “incomplete understanding of these systems and their actions based on this limited understanding may be unpredictable” (p. 81)
  • RCTs can be used for simple and even complicated interventions, but are not appropriate for evaluating complex because they are  “inherently unable to deal with complexity” (p. 80)
  • also, it is important to remember that “interventions interact with complex systems in ways that cannot be predicted. The evaluation challenge lies in understanding this interaction” (emphasis mine, p. 80)

“While we need to challenge the expectation that evaluations of the complex will lead to more precise preditions and greater control, we should not adandon the belief that appropriately structured evaluations can contribute positively to reflexivity while simultaneously fulfilling the evaluators’ mission to strengthen both learning and accountability.. To do so we will need ot trade our search for universal generalizability in favour of more modest, more contigent, claims. In evaluating complex interventions we should settle for constantly improving understanding and practice by focusing on reducing key uncertainties.” (p. 81)

  • problem with “more conventional approaches” to program evaluation when used in situations of complexity:
    • expect to understand the whole by looking at a combination of its parts
    • evaluations “therefore […try to…] build up detailed pieces of evidence into an accurate account of the costs (or efforts) and the consequences, […] add up all the inputs, describe the processes, list the outputs and (possibly) weight outcomes and put this together to form judgements about and draw evaluative conclusions” (p. 81)
    • this can work for simple or complicated interventions “where we can make highly plausible assumptions that we know enough about both the intervention and the context” (p. 81)
  • for complexity, however, this is not the case:
    • need to “start with an understanding of the systems within which the parts operate”
    • “it is not simple the presence of [factors] (and the more the better), [but] rather it is how these parts are combined and balanced […] and how they are shaped to address local circumstances or resonate with national agendas. In other words, how they form a system of improvement and how this systems interacts with other systems in and around healthcare services. From an evaluator’s point of view, ‘What matters is making sense of what is relevant, i.e., how a particular intervention works in the dynamics of particular settings and contexts.'” (emphasis mine, p. 81-2)
  • “conceptualizing complex interventions is made more difficult still by the fact that we rarely find an intervention that can adequately be described as a single system. More often there are systems nested within systems.” (p. 82)
    • e.g., systems “operating individual, organization, and whole-system levels (or micro, meso, and macro)” (p. 82)
    • “when we talk about an intervention being context-dependent, or context-rich, we are describing how the processes and outcomes in each case are shaped by the particular ways in which these systems and subsystems uniquely interact” (p. 82)
  • “most economic evaluations are still primarily quantitative evaluations of “black box” interventions – that is, with little or no explicit interest in how and why they generate different effects or place different demands on the use of resources” (p. 83)
  • we need to recognize that the context in which an intervention is conducted is important, but “this approach to contextualization could lead to the conclusion that every context is different and unique and so we cannot use the lessons from one evaluation to inform decisions elsewhere […] To address this challenge, we can use complexity thinking to go beyond simply arguing that each context is different by showing how particular system function and how systems interact. If this were successful it would provide a way of contextualizing and then allowing ‘mid-range generalization’. This could deliver sufficiently thick description of the workings of systems and subsystems to support reflexive learning within the intervention and more informed decision making elsewhere. It establishes mid-ground between the uniqueness of everything and universal generalizability.” (emphasis mine, p. 83-4)
  • “evaluations should more often be conducted in real time and support reflexive learning and informed adaptation. Rather than seeing an intervention as a fixed sequence of activities, organized in linear form, capable of being duplicated and repeated, we see an intervention as including a process of reflection and adaptation as the characteristics of the complex system become more apparent to practitioners. The evaluation aims in real time to understand these and support more informed adaptation by practitioners. It also provides an account of if and how effectively practitioners have adapted their activities in the light of intended goals. They can be held to account for their intelligent adaptation rather than slavishly adhering to a set of instructions. Furthermore, the evaluation should say something about how the approach might be applied elsewhere.” (emphasis mine, p. 84-5)
  • Ling cites Stirling’s Uncertainty Matrix as a useful way to think about the “different kinds and causes of uncertainty” (p. 85)

Uncertainty Matrix (adapted from Stirling, 2010)

Uncertainty Matrix

  • probabilities – i.e., the chance of something happening
  • possibilities – i.e., the range of things that can happen
  • our knowledge of probabilities and possibilities can each be either non-problematic (i.e., we know the chance of something happening and we know the range of things that can happen, respectively) or problematic
  • risk – when we know the range of possibilities and each of their probabilities – we can engage in risk assessments, expert consensus, optimizing models
  • uncertainty – limited number of possibilities but we don’t know the probabilities of them occurring – we can use scenarios, sensitivity testing, etc.
  • ignorance – both range of possibilities sand their probabilities not known – we need to monitor, be flexible and adaptive
  • ambiguity – range of possibilities is problematic, but probabilities not problematic – we can use participatory deliberation, multicriteria mapping, etc.
  • for simple interventions, evaluations aim for certainty
  • for complicated interventions, evaluations aim to reduce (known) uncertainty
  • for complex interventions, evaluations aim to support a self-improving system
    • first aim to expose uncertainties, then to reduce them
    • “need to understand both activities and contexts, important to identify how learning and feed back happens, understand both system dynamics but also what makes change ‘sticky’, real-time evaluation necessary, requires a counterfactual space or matrix” (p. 86)
  • Ling recommends “an evaluation approach based […on] understanding the unfolding ‘Contribution Stories’ that those involved in delivering and adapting interventions work with to describe their activities and anticipated events” (p. 86-7)
    • Contribution Stories “aim to surface and outline how those involved in the intervention understand the causal pathways connecting the intervention to intended outcomes” and “provide an opportunity to explore their thinking about how the different aspects of the intervention interact with each other and with other systems” (p. 87)
    • from the Contribution Stories, “more abstract Theories of Change can be developed which trace the causal pathway linking resources use to outcomes achieved. Theses Theories of Change will be contingent and context-dependent and should be expressed as ‘mid-range theories’; not so specific that they amount to nothing more than a listing of micro-level descriptions of the causal pathway of the specific intervention but also not so abstract that it cannot be tested or informed by the evidence from the evaluation.” (emphasis mine, p. 87)
    • next, evaluators: (1) “identify key uncertainties associated with the intervention – those anticipated causal linkages for which there is limited evidence or inherent ambiguities or ignorance.” and (2)  Data collection & analysis would then aim to reduce these uncertainties, hopefully producing evidence that would be both relevant and timely.” (p. 87)
  • 6 stages (at which evaluators should “reflect on the consequences of complexity” (p. 87)
    1. Understand the interventions Theory of Change and its related uncertainties
      • include “importance of learning and adaptive” (p. 87)
      • “identify key dependencies upon systems and subsystems which lie outside the formal structures of the intervention” (p. 87)
    2. Collect and analyse data focused on key uncertainties
      • “identify where key uncertainties exist”
      • identify “what sort of uncertainty it is” (ignorance, risk, ambiguity, uncertainty)
      • “data collection alone may not address all of the key uncertainties” (p. 87)
    3. Identify how reflexive learning takes place through the project and plan data collection and analysis to support this, strengthening the formative role of evaluation
      • there is a “creation of evidence by the project itself as it learns and adapts”
      • the “evaluation can support this learning as part of a formative role at the same time as building a data base for its own summative evaluation” (p. 87) (with a shift it the balance towards a more formative role) [this sounds like what I’ll be doing with my project]
    4. Building a portfolio of activities and costs
      • “identifying boundaries around the cost base is made difficult when the success of a project may depend more on harnessing synergies from outside the intervention itself.” (p. 88)
      • “a major cost in conditions of complexity is equipping projects to be adaptable and responsive to a changing environment. Essentially, part of what is being ‘bought’ is flexibility and, by definition, this means that some resources might not need to be used. It could be regarded as the cost of uncertainty” (p. 88)
    5. Understanding what would have happened in the absence of the intervention
      • it is “often much harder to identify the counterfactual” for a complex intervention than for simple/complicated ones, but it is still “crucial to pose the core question in an evaluation which is ‘did it make a difference?'” which of course requires us to ask “compared to what?
      • rather than the counterfactual being a single thing, think of it more as “a counterfactual space of more or less likely alternative states. This might be produced by scenarios, modelling, simulation, or even expert judgement depending upon the nature of the uncertainty” (p. 88)
    6. “The evaluation judgment should not aim to identify attribution (what proportion of the outcome was produced by the intervention?) but rather to clarify contribution (how reasonable is it to believe that the intervention contributes to the intended goals effectively and might there be better ways of doing this?)” (p. 88)
  • the above is a general outline – still needs to be fleshed out
  • important to remember that “interventions change as they unfold” and “this adaptation is both necessary and unpredictable” (p. 89)

A Few Points from the Stirling Paper

  • I looked up the Stirling paper that Ling had cited to read more about the uncertainty matrix. This paper made the point that “when knowledge is uncertain, experts should avoid pressures to simplify their advice. Render decision-makers accountable for decisions.” (p. 1029).
  • Also: “An overly narrow focus on risk is an inadequate response to incomplete knowledge.” (p. 1029)

A Paper on “Using Programme Theory to Evaluate Complicated and Complex Aspects of Interventions”

  • It’s not about “creating messier logic models with everything connected to everything. Indeed, the art of dealing with the complicated and complex real world lies in knowing when to simplify and when, and how, to complicate” (p. 30)
  • various names for “program theory”:
    • programme logic
    • theory-based evaluation
    • theory of change
    • theory-driven evaluation
    • theory-of-action
    • intervention logic
    • impact pathway analysis
    • programme theory-driven evaluation science
      they all refer to “a variety of ways of developing a causal modal linking programme inputs and activities to a chain of intended or observed outcomes, and then using this model to guide the evaluation” (p. 30)
  • Glouberman and Zimmerman’s (2002) analogy re: complexity:
    • simple = following a recipe (very predictable)
    • complicated = sending a rocket ship to the moon (need a lot of expertise, but there is high certainty about the outcome; doing it once increases your likelihood of doing it again with the same result)
    • complex = raising a child (every child is unique and needs to be understood as such; what works well with one child will not necessarily work well with another; uncertainty about outcome)
  • Rogers suggests using this distinction to think about different aspects of an intervention (as some aspects of an intervention could be simple, while others are complicated or complex)
  • simple linear logic models:
    • (inputs –> activities –> outputs –> outcomes –> impact):
    • lack information about other things that can affect program outcomes, such as “implementation context, concurrent programmes and the characteristics of clients” (p.34)
    • risk overstating the causal contribution of the intervention” (p. 34)
    • best to reserve simple logic models for “aspects of interventions that are in fact tightly controlled, well-understood and homogeneous or for situations where only an overall orientation about the causal intent of the intervention is required, and they are clearly understood to be heuristic simplifications and not accurate models” (p. 35)
  • complicated logic models:
    • multi-site, multi-governance – can be challenging to get multiple groups to agree on evaluation questions/plans, but if there is a clear understanding of the “causal pathway” (e.g., a parasite causes a known problem, program is working to reduce the spread of that parasite), you can use a single logic model, report data separately for each site and in aggregate for the whole
    • simultaneous causal strands – all of which are required for the program to work (“not optional alternatives but each essential” (p. 37); must show them in the logic model (and indicate they are all required) and collect data on them
    • alternative causal strands – where the “programme can work through one or the other of the causal pathways” (p. 37); often, different “causal strands are effective in particular contexts”; difficult to denote visually on a logic model
      • can conducted “evaluation that involve ‘comparative analysis over time of carefully selected instances of similar policy initiatives implemented in different contextual circumstances’ ” (Sanderson, 2000 cited in Rogers, 2008, p. 37)
      • it’s important to document the alternative causal strands in an “evaluation to guide appropriate replication into other locations and times” (p. 38)
  • complex logic models:
    • two aspects of complexity that Rogers talks about as having been addressed in published evaluations:
      • recursive causality & tipping points – rather than program logic being a simple “linear progression from  initial outcomes to subsequent outcomes” (p. 38), the links are “likely to be recursive rather than unidirectional” and have “feedback mechanisms [and] interactive configurations” – it’s “mutual, multidirectional, and multilateral” (Patton, 1997 cited in Rogers, 2008, p. 38)”
      • “many interventions depend on activating a ‘virtuous circle’ where an initial success creates the conditions for further success,” so, “evaluation needs to get early evidence of these small changes, and track changes throughout implementation” (p. 38)
      • ‘tipping points’ – “where a small additional effort can have a disproportionately large effect, can be created through virtuous circles, or a result of achieving certain critical levels” (p. 38)
      • can be hard to show virtuous circles/tipping points on logic model diagrams, so may require notes on diagrams [I wonder if we can do anything with technology to better illustrate this?]
      • emergence of outcomes
        • what outcomes there will be, and how they will be achieved, “emerge during implementation of an intervention”
        • this may be appropriate :
          • “when dealing with a ‘wicked problem’
          • where partnerships and network governance are involved, so activities and specific objectives emerge through negotiation and through developing and using opportunities
          • where the focus is on building community capacity, leadership, etc., which can then be used for various specific purposes” (p. 39)
        • could develop a “series of logic models […] alongside development of the intervention, reflecting changes in understanding. Data collection, then, must be similarly flexible.” (P. 39)
          • may have a clear idea of the overall goals, but “specific activities and causal pathways are expected to evolve during implementation, to take advantage of emerging opportunities and to learn from difficulties” (p. 40) – so could develop an initial model that is “both used to guide planning and implementation, but [is] also revised as plans change” (p. 40) [this is what we are doing on my current project]
  • interventions that have both complicated and complex aspects
    •  e.g., multi-level/multi-site (complicated) and emergent outcomes (complex)
    • could have a logic model that “provide[s] a common framework that can accommodate local adaptation and change” (p. 40)
    • “a different approach is not to present a causal model at all, but to articulate the common principles or rules that will be used to guide emergent and responsive strategy and action” (p. 42-3)
  • how to use program theory/logic models for complicated & complex program models
    • with simple logic models, we use program theory/logic models to create performance measures that we use to monitor program implementation and make improvements
    • with complicated & complex models, we cannot do this so formulaically
    • one of the importance uses of program theory/logic models in these situations is in having “discussions based around the logic models” (p. 44)
    • evaluation methods tend to be more “qualitative , communicative, iterative, an participative” (p. 44)
    • “the use of ’emergent evaluation’ – engaging stakeholders in “highly participative” processes that “recognize difference instead of seeking consensus that might reflect power differences rather than agreement” (p. 44) – and then these “multi-stakeholder dialogues [are] used simultaneously in the roles of data collection, hypothesis testing and intervention, rather than evaluators going away with the model and returning at the end with result” (p. 44) – and stakeholders can then “start to use the emerging program theories […] to guide planning, management and evaluation of their specific activities.” (p. 44)
    • having “participatory monitoring and evaluation to build better understanding and better implementation of the intervention” (p. 45)
    • citing Douthwaith et al, 2003: “Self-evaluation, and the learning it engenders, is necessary for successful project management in complex environments” (p. 45)
  • final thoughts:
    • “The anxiety provoked by uncertainty and ambiguity can lead managers and evaluators to seek the reassurance of a simple logic model, even when this is not appropriate[, but…] a better way to contain this anxiety might be to identify instead the particular elements of complication or complexity that need to be addressed, and to address them in ways that are useful” (p. 45)

I have a lot more articles to read on this topic, but this blog posting is getting very long, so I’m going to publish this now and start a new posting for more notes from other papers.

References
Gerrits, L. & Verweij, S. (2015). Taking stock of complexity in evaluation: A discussion of three recent publications. Evaluation. 21(4): 481-91.
Ling, T. (2012). Evaluating complex and unfolding interventions in real time. Evaluation. 18(1): 79-91.
Rogers, P. (2008). Using Programme Theory to Evaluate Complicated and Complex Aspects of Interventions. Evaluation 14(1): 29-48.
Stirling, A. (2010). Keep it complex. Nature. 468. p. 1029-1031.
This entry was posted in evaluation, notes and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *