On Flexibility in Evaluation Design

Been doing some reading as I work on developing an evaluation plan for a complex program that will be implemented at many sites. Here are some notes from a few papers that I’ve read – I think if anything links these three together, it is the notion of the need to be flexible when designing an evaluation – but you also need to think about how you’ll maintain the rigour of your work.

Wandersman et al (2016)’s paper on using an evaluation approach called “Getting to Outcomes (GTO)” discussed the notion that just because an intervention has been shown to be effective in one setting does not necessarily mean it will work in other settings. While I wasn’t interesting in the GTO approach per se, I found their introduction insightful.

Some notes I took from the paper:

  • the rationale for using evidence-based interventions is that since research studies show that a given intervention leads to positive outcomes, then if we take that intervention and implement it in the same way it was implemented in the research studies (i.e., fidelity to the intervention) on a broad scale (i.e., at many sites), then we should see those same positive outcomes on a broad scale
  • however, when this is actually done, evaluations often show that the positive outcomes compared to control sites don’t happen or that positive outcomes happen on average, but there is much variability among the sites such that some sites get the positive outcomes and others don’t (or even that some sites get negative outcomes)
  • from the perspective of each individual site, having positive outcomes on average (but not at their own particular site) is not good enough to say that this intervention “works”
  • when you implement complex programs at multi-sites/multi-levels, you “need to accommodate for the contexts of the sites, organizations, or individuals and the complete hierarchies that exist among these entities […] the complexity […]” includes multiple targets of change and settings” (p. 549-50)
  • recommendations:
    • evaluate interventions at each site in which it is implemented
    • examine the quality of the implementation
    • consider the fit of the intervention to the local context
      • “the important question is whether they are doing what they need to do in their own setting in order to be successful” (p. 547)
      • “the relevant evaluation question to be answered at scale is not “does the [evidence-based intervention] result in outcomes?” but rather “how do we achieve outcomes in each setting?” (p. 547)
    • evaluators should “assist program implementers to adapt and tailor programs to meet local needs and provide ongoing feedback to support program implementation” (p. 548)
  • empowerment evaluation: premise is: “if key stakeholders (including program staff and consumers) have the capacity to use the logic and tools of evaluation for planning more systematically, implementing with quality, self-evaluating, and using the information for continuous quality improvement, then they will be more likely to achieve their desired outcomes”

Balasubramanian et al (2015) discussed what they call “Learning Evaluation”, which they see as a blend of quality improvement and implementation research. To me it sounded similar to Developmental Evaluation (DE). For example, they state that:

  • “Two key aspects of this approach set it apart from other evaluation approaches; its emphasis on facilitating learning from small, rapid cycles of change within organizations and on capturing contextual and explanatory factors related to implementation and their effect on outcomes across organizations”  (p. 2 of 11)
  • “assessment needs to be flexible, grounded, iterative, contextualized, and participatory in order to foster rapid and transportable knowledge. This approach integrates the implementation and evaluation of interventions by establishing feedback loops that allow the intervention to adapt to ongoing contextual changes.” (p. 2 of 11)

That sound a lot like DE to me. And it sounds a lot like how I’m looking to approach the evaluation I’m currently planning.

Principles underlying the “Learning Evaluation” approach (from page 3 of 11):

 Principle Why
 1. Gather data to describe the types of changes made by healthcare organizations, how changes are implemented, and the evolution of the change process. To establish initial conditions for implementing innovations at each site and to describe implementation changes over time.
 2. Collect process and outcome data that are relevant to healthcare organizations and to the research team To engage healthcare organizations in research and in continuous learning and quality improvement.
 3. Assess multi-level contextual factors that affect implementation, process, outcome, and transportability. Contextual factors influence quality improvement: need to evaluate conditions under which innovations may or may not result in anticipated outcomes.
 4. Assist healthcare organizations in applying data to monitor the change process and make further improvements.  To facilitate continuous quality improvement and to stimulate learning within and across organizations.
5. Operationalize common measurement and assessment strategies with the aim of generating transportable results. To conduct internally valid cross-organization mixed methods analyis

A point that was made in this paper that resonated with me was that: “Within the context of a multi-site demonstration project conducted in real-world settings, it was not feasible to randomize sites or to specify target patient samples or measures a priori.” (p. 7 of 11) Instead, they incorporated elements to enhance the study’s rigour:

  • rigour in study design
    • considered each site as a “single group pre-post quasi-experimental study”, which is subject to history 1i.e., how do you know results aren’t do to other events that are occurring concurrently with the intervention? and maturation 2i.e., how do you know the results aren’t just due to naturally occurring changes over time rather than being due to the intervention? threats to internal validity
    • to counteract these threats, they collected qualitative data on implementation events (to allow them to examine if results are related to implementation of the intervention)
    • they also used member checking to validate their findings
  • rigour in analysis
    • rather than analyzing each source of data independently, they integrated findings
    • “triangulating data sources is critical to rigor in mixed methods analysis”
    • qualitative data analysis was conducted first within a given site (e.g., “to identify factors that hindered or facilitated implementation while also paying attention to the role contextual influences played” (p. 7 of 11), then across sites.

A few other points they make:

  • “ongoing learning and adaptation of measurement allows both rigor and relevance” (p. 8 of 11)
  • by “working collaboratively with innovators to develop data collection strategies and routine processes for jointly sharing and reflecting on data to foster continuous learning, improvement, and advocacy for policy changes” the organization can “develop capacity for data collection and monitoring for future efforts” (p. 8 of 11)
  • this approach “may feel to some to be at odds with current standards of rigor, which value fidelity to a priori hypotheses and methods”, but it is “not a ‘canned’ approach to evaluating healthcare innovations, but it involves the flexible application of five general principles” (p. 9 of 11). “This requires [evaluators] to be flexible and nimble in adapting their approach when proposed innovations are modified to fit the local context.” (p. 9 of 11)

Brainard & Hunter conducted a scoping review with the question “Do complexity-informed health interventions work?” What they found was that although “the lens of complexity theory is widely advocated to improve health care delivery,” there’s not much in the literature to support the idea that using a complexity lens to design an intervention makes the intervention more effective.

They used the term “‘complexity science’ as an umbrella term for a number of closely related concepts: complex systems, complexity theory, complex adaptive systems, systemic thinking, systems approach and closely related phrases” (p. 2 of 11). They noted the following characteristics of systems:

  • “Large number of elements, known and unknown.
  • Rich, possibly nested or looping, and certainly overlapping networks, often with poorly understood relationships between elements or networks.
  • Non-linearity, cause and effect are hard to follow; unintended consequences are normal.
  • Emergence and/or self-organization: unplanned patterns or structures that arise from processes within or between elements. Not deliberate, yet tend to be self-perpetuating.
  • A tendency to easily tip towards chaos and cascading sequences of events.
  • Leverage points, where system outcomes can be most influenced, but never controlled.” (p. 2 of 11)

They also had some recommendations for reporting on/evaluating complexity-informed interventions:

  • results should be monitored over the long term (e.g., more than 12 months) as results can take a long time to occur
  • barriers to implementation should be explored/discussed
  • unintended/unanticipated (including negative) changes should be actively looked for
  • support from the institution/senior staff combined with widespread collaborative effort is needed to successfully implement
  • complexity science or related phrases should be in the title of the article


Balasubramanian, B., Cohen, D.J., Davis, M.M., Gunn, R., Dickinson, L.M., Miller, W.L., Crabtree, B.F., & Stange, K.C. Learning Evaluation: blending quality improvement and implementation research methods to study healthcare innovations. Implementation Science. 10: 31. (full text)

Brinard, J., & Hunter, P.R. Do complexity-informed health interventions work? A scoping review. Implementation Science. 11:127. (full text)

Wandersman, A., Alia, K., Cook, B.S., Hsu, L.L., & Ramaswamy, R. (2016). Evidence-Based Interventions Are Necessary but Not Sufficient for Achieving Outcomes in Each Setting in a Complex World: Empowerment Evaluation, Getting To Outcomes, and Demonstrating Accountability.  American Journal of Evaluation. 37(4): 544-561. [abstract]


1 i.e., how do you know results aren’t do to other events that are occurring concurrently with the intervention?
2 i.e., how do you know the results aren’t just due to naturally occurring changes over time rather than being due to the intervention?
This entry was posted in evaluation, healthcare, notes and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *