Dr. Beth Snow | Page 6

Applying complexity theory: A review to inform evaluation design

this paper uses “complexity” to refer to “understanding the social systems within which interventions are implemented as complex” (p. 119)
he defines a complex system as “comprised of multiple interacting actors, objects and processes define as a system based on interest or function” and “are nested”. “The interaction of components in a complex system gives rise of ’emergent’ properties, which cannot be understood by examining the individual system components” and “interactions are non-linear” (p. 119)
“challenges posed by complex social systems for evaluation relate to uncertainty in the nature and timing of impacts arising from interventions, due to non-linear interactions within complex systems and the ’emergent’ nature of system outcomes. There are also likely to be differing values and valuation of outcomes from actors across different parts of a complex system, making judgements of ‘what worked’ contested.” (p. 120)
“due to the open boundaries of complex systems, there are always multiple interventions operating and interacting, creating difficulties identifying the effects of one intervention over another” (p. 120)
in the literature, “there is little consensus regarding what the key characteristics of a complexity informed policy or program evaluation approach should be” (emphasis mine, p. 120)
this review paper identified the following themes:
- developing an understanding of the system
  - “the need to develop a picture of the system operating to aid analysis of both interaction and of changes in system parts” (p. 121)
  - “boundaries are constructs with decisions of inclusion and exclusion reflecting positions of actors involved in boundary definitions” (i.e., “boundaries likely reflect the interest of evaluators and others defining evaluation scope”) (p. 121)
  - “complex system boundaries are socially constructed, so we should be asking about what systems are being targeted for change and what ‘change’ means to various people involved” (p. 125)
- attractors, emergence, and other complexity concerns
  - emergent properties: “generated through the operation of the system as a whole and cannot be identified through examining individual system parts” (p. 123)
    - this challenges the role of evaluating predetermined goals
  - attractor states: “depict a pattern of system behaviour and represents stability, with a change in attractor state representing a qualitative shift in the system, with likely impacts on emergent phenomena” (p. 123)
  - it’s important to keep “a holistic view of the system over long time periods” (p. 123)
- defining appropriate level of analysis
  - the literature includes “a clear call for evaluation to focus upon multiple levels, whilst also noting the challenge this creates” (p. 123)
- timing of evaluations
  - “non-linear interactions and potential for sudden system transformation suggest we cannot predict when the effects of an intervention will present. Therefore, long evaluative time frames may be required” (p. 123)
  - “evaluation should, if possible, occur concurrently alongside programme development and implementation” (p. 123)
  - but long timeframes “pose a challenge to the question of what should be evaluated” and “suggest that evaluative activity needs to be on going and that the line between evaluation and monitoring may be blurred. Attribution of outcomes to specific interventions becomes more complicated over time with the number of local adaptations, national level policy changes, and social and economic contextual changes likely to increase.” (p. 123)
  - there is a “role of evaluation for understanding local adaptations and feeding back into implementation processes” (p. 123) – and this “may be more immediate and relevant to current implementation decisions [than a focus on outcomes/attribution] and therefore provide a more tangible focus for evaluation” (p. 123)
- participatory methods
  - “used to gather perspectives of actors across the system to develop systems descriptions; understand how interventions are adapted at the local level; and make explicit different value claims of actors across the system” (p. 124)
- case study and comparison designs
  - pro: “ability to develop a detailed understanding of a system (or a limited number of systems), in line with complexity theory concepts” (p. 124)
- multiple and mixed methods
  - “a logical response to the challenge of providing contextualised information on what works” (p. 124)
- layering theory to guide evaluation
  - “multiple theories can be nested for explanation at multiple levels of a system” (p. 124)
“participation build into the evaluation from the start, and a close relationship with stakeholders throughout the evaluation lifecycle is part of an ‘agile’ evaluation” (p. 125)

Perturbing ongoing conversations about systems and complexity in health services and systems

“What matters is making sense of what is relevant, i.e., how a particular intervention works in the dynamics of particular settings and contexts” (p. 549)
“the most useful questions addressing complex problems must imply an open system: ‘What will the intervention be able to produce? and ‘What kind of behaviour will emerge? What are our frames of reference? What are our ideas and values in relation to success?’ (Stengers cited on p. 549)
“Frameworks for understanding policy development do not merely describe the process. They invariably indicate what a “well-functioning” process is like. And so they place a value on certain structures and behaviour. As our theories change, so do our vies of what is good” (Glouberman cited on p. 549)
“common to complex systems are two fundamental themes:
- the universal interconnectedness and interdependence of all phenomena
- the intrinsically dynamic nature of reality” (p. 549)
although there seems to be lots of talk about complexity, its “uptake” in health systems/services has been slow
- “reductionism remains the dominant paradigm”
- we often break down the work of clinicians into “discrete activities based on a business model drive by the agenda of cost containment rather than improved patient health” (p. 550)
- “we must counterintuitively work to develop appropriate abstract frameworks and categories, and reflect on our ways of knowing, if we are to gain a deeper understanding of the processes that operate in complex systems, and how to intervene more successfully” (p. 550)
“the awareness of complexity does not imply answering questions or solving problems: rather, it means opening problems up to dynamic reality, as well as increasing the relative level of awareness. Thus, the notion of complexity […] strongly supports the possibility that […] questions and answers may change, as well as the nature of questions and answers upon which scientific investigation is built” (p. 551)

Theory-based Evaluation and Types of Complexity

“evaluation deficit” – “the unsatisfactory situation in which most evaluations, conducted at local and other sub-national levels, provide the kind of information (on output) that does not immediate inform an analysis of effects and impacts at higher levels, i.e., whether global objectives have been met. Or, conversely, impact assessment is not corroborated by an understanding of the working of programmes.” (p. 59)
Stame criticizes “mainstream evaluation” for “choosing to play a low-key role. Neither wanting to enter into the ‘value’ problem […], nor wanting to discuss the theoretical implications of programmes, evaluators have concentrated their efforts on developing a methodology for verifying the internal validity (causality) and external validity (generalizability) of programmes” (p. 59). She goes on to list the consequences of this “low-key” approach:
- “fail to formally or explicitly specify theories”
- assuming the programs are “rational” – e.g., “assuming the needs are know, decision makers are informed […], decisions are taken with the aim of maximizing gains from existing resources”
- since programs are seen as “rational”, “politics was seen as a disturbance or interference and the political context itself never became an object of inquiry”
- thinking of the outcome of evaluation as being just “‘instrumental’ use: saying that something worked or did not work” (p. 60)
theory-oriented evaluations:
- “changes […] the attitude towards methods” … “All methods can have merit when one puts the theories that can explain a program at the centre of the evaluation design. No method is seen as the ‘gold standard’. Theories should be made explicit, and the evaluation steps should be built around them: by elaborating on assumptions; revealing causal chains; and engaging all concerned parties” (p. 60)
- some different approaches to theory-oriented evaluations:
  - Theory-driven evaluation (Chen & Rossi) – many programs have “‘no theory’, goals are unclear, and measures are false”, so “evaluations are ‘at best social accounting studies that enumerate clients, describe programs, and sometimes count outcomes”. “The black box is an empty box.” Thus, their approach is “more to provide a programme’s missing theory than to discuss the way programmes exist in the world of politics” (p. 61).
  - Theory-based evaluation (Weiss) – the “black box is full of many theories […that] take the form of assumptions, tacit understandings, etc: often more than one for the same programme.” (i.e., different people involved – the many program implementers, recipients, funders, etc. – may all be operating based on different ideas of how the program works and may not even be aware of their own theories. Two parts to theories of change (1) “‘implementation theory,” which forecasts in a descriptive way the steps to be taken in the implementation of the programme” and (2) “‘programmatic theory’, based on the mechanisms that make things happen” (p. 61-62)
  - Realist Evaluation (Pawson & Tilley): they “stress what the components of a good programme theory should be: context (C) and mechanism (M), which account for outcome (O). Evaluation should be based on the CMO configuration. Programmes are seen as opportunities that an agent, situated inside structures an organizations, can choose to take, and the outcomes will depend on how the mechanism that is supposed to be at work will be enacted in a given context.” “We cannot know why something changes, only that something has changed […] in a given case. And that is why it is so difficult to say whether the change can be attributed to the programme. The realist approach is based on a ‘generative’ theory of causality: it is not programmes that make things change, it is people, embedded in their context, who, when exposed to programmes, do something to activate given mechanisms, and change. So the mystery of the black box is unveiled: people inhabit it.” (p. 62)
- similarities among these theory-oriented approaches:
  - evaluation is based on “an account of what may happen”
  - they “consider programmes in their context”
  - use “all methods that might be suitable”
  - “are clearly committed to internal validity (they indeed look for causality), but nonetheless allow for comparisons across different situations” (p. 63)
- differences among these theory-oriented approaches: role of theory, role of context
“reality is complex because:
- it is stratified, and actors are embedded in their own contexts; and
- each aspect that maybe be examined and dealt with by a programme is multifaceted” (p. 63) [this doesn’t seem to fit any of the other definitions of “complexity” that I’ve read]
“if […] the evaluator considers that what is important is to know how impact has been attained and why, s/he is bound to consider that means […] are relevant. Evaluation is then concerned with different ways of reaching objectives, and tries to judge which policy instruments, in isolation or in combination, and in what sequence, are better suited to the actors situation in given contexts” (p. 66)

Complex, but not quite complex enough: The turn to the complexity sciences in evaluation scholarship

This article provides a critique to the way that many evaluators have been writing about, and attempting to apply, “complexity sciences” (see my previous posts here and here for my notes from some of the types of articles he’s critiquing)
Mowles’ main critiques are that:
- “there is a tendency either to over-claim or under-claim [the] importance” (p. 160) of complexity sciences
- evaluation “scholars are not always careful about which of the manifestations of the complexity sciences they are appealing to” (p. 160)
- evaluation scholars do not always “demonstrated how they understand [the complexity sciences] in social turns” (p. 160)
  - evaluators who favour a “contingency approach to complexity” (i.e., we can pick and choose when to use it based on our decision about if a program (or part of a program) is “complex”) “suggest complexity is a ‘lens’ or framework to be applied if helpful, and take emergence to mean the opposite of being tightly planned” (p. 167). This leads to evaluators seeing only those programs (or parts of programs) that they have deemed to be complex as “need[ing] “a special and “trying to feed back data and information in real time” (p. 167)
  - But “in portraying emergence as a special phenomenon [these evaluators] have implicitly dismissed the idea that the human interaction is always complex, and that emergence, which we might understanding in social terms as the interplay of intentions, is always happening, whether a social program is tightly planned or not” (p. 167)
- thus, “complexity sciences” are used as just another tool within evaluation as a “logical, rational activity” (p. 160) – he cites Fleck who “described the ways in which groups of scholars, committed to understanding the world in a particular way, resist the rise of new ideas by either ignoring them or rearticulating them in terms of the prevailing orthodoxy” (p. 161) – with the implication being that this is what many evaluators are doing – rather than grappling with the complexity sciences to see what the implications are for evaluation, they are trying to fit the complexity sciences into their existing ways of evaluating. He goes on to ask “what difference appealing to the complexity sciences makes to the prescriptions that scholars recommend for evaluative practice” (p. 161). – that is, does “applying” complexity theory lead these evaluators to do anything differently than they would have done without it?

trends noted in evaluation scholarship re: complexity:
- many suggest that complexity is something that an evaluator should choose at what time/in what circumstances to use
  - many use the “Stacey Matrix” which and which is “a contingency theory of organizations understood as complex adaptive systems [that] suggests that the nature of the decision facing managers depends on the situation facing them” (p. 163) [The one Patton uses in his Developmental Evaluation book – with low-high certainty on one axis and low-high agreement on the other and you use it to determine if something is simple, complicated, complex, or chaotic] – even though “Stacey himself abandoned the idea that organizations can be helpfully understood as complex adaptive systems, and has moved on from a contingency perspective” (p. 163)
  - using this approach “allows evaluators in the mainstream to claim that the complexity sciences may be quite helpful but only in circumstances of their own choosing” – this represents the “‘spectator theory of knowledge’, which sustains a separation between the observer and the thing observed.” (p. 164)
  - Mowles suggests that everything is complex “even following rules like a recipe [the oft given example of “simple”] “is a highly social process where the rules inform practice and practice informs the rule” (p. 163)
- talking about “complexity sciences” as if it were just one thing, “homogenizing” them OR just picking “some of the characteristics of particular manifestations of the complexity sciences” (p. 162) [thought: it’s kind of funny that complexity theory includes the notions that the whole is not just the sum of the parts/you can’t understand the whole just by looking at the parts… but then we say we are applying complexity theory by just looking at some of the parts]
  - he notes that Patton draws on a lot of aspects of complexity sciences for his Developmental Evaluation approach “without offering a view as to whether one particular branch of the complexity sciences is more helpful than another” (p. 164)
  - he also notes, somewhat snarkily (though not unjustifiably) in my opinion, that “In the development of the disciplines of evaluation, particularly those claiming to be theory-based, it is probably important to know what the theories being taken up actually claim to be revealing about nature, and to be able to make distinctions between one theory and another” (p. 164)
- making the assumption that “the social is best understood in systemic terms”; social/health interventions understood as a “system with a boundary, even if that boundary is ‘open’. Interaction is then understood as taking place between entities, agenda, even institutions operating at different ‘levels’ of the system, or between systems, which leads to the idea that social change can be both wholesale and planned”….. this “allows scholars to avoid explaining their theory of social action, or to interpret complexity theories from the perspective of social theory and thus to read into them more than they sustain” (p. 162)
“insights from complexity theory help us understand why social activity is unpredictable”, but remember that “evaluation practice […] is also a social activity”, so “it can no longer be grounded in the certainties of the rational, designing evaluator” (p. 163)

a brief summary of how complexity theory evolved over time:
- Step 0: equilibrium model in classical physics & economics that assumes
  (a) system with a boundary, made of interacting entities
  (b) entities are homogeneous
  (c) interactions occur at an average rate
  (d) system moves towards equilibrium
- Step 1:
  - removes assumption (d) (i.e., not assumed to be moving towards equilibrium)
  - replaces linear equations with non-linear
  - output of one equation feeds into next iteration of the equation
  - basis for modeling chaos
- Step 2:
  - removes assumptions (c) (i.e., interactions not assumed to occur at an average rate) and (d) (i.e., not assumed to be moving towards equilibrium)
  - used to explain dissipative structures, things jumping to different states, ability of things to self-organize
- Step 3:
  - removes assumptions (b) (i.e., entities are not homogeneous), (c) (i.e., interactions not assumed to occur at an average rate) and (d) (i.e., not assumed to be moving towards equilibrium)
complex adaptive systems (CAS) – “agent-based models run on computer” are “temporal models that change qualitatively over time and attempt to explain how order emerges from apparent disorder, without any overall blue-print or plan” (p. 165)
- attempts “to describe how global patterns arise form local agent behaviour” (p. 166)
- can operate at Step 2 or 3
in real life, people are not homogenous and interactions are not average and not linear, so it is at Step 3 where we see “truly evolutionary and novel behaviour emerge” (p. 166)
“models are helpful in supporting us to think about real world problems, [but remember that…] “mathematical models uncover fundamental truths about mathematical objects and not much about the real world” (p. 166)

Mowles identifies three evaluation scholars (Callaghan, Sanderson, Westhorp) who suggest that evaluators should “draw on insights from the complexity sciences more generally to inform evaluation practice, rather than understanding the insights to refer only to special cases” (p. 167)
he identifies that the use of experimental methods to evaluate represents the “highest degree of abstraction” (p. 167) from the program being evaluated and notes that “Theories of Change” are a “hybrid of systems thinking and emancipatory social theory” (p. 168) as they “draw on propositional logic and represent social change in the form of entity-based logic models showing the linear development of social interventions towards their conclusions” (p. 167), but also “often point to the importance of participation and involvement of the target population of programmes to inspire motivation” (pp. 167-8)
“realist evaluators” talk of “‘generative’ theories of causality, i.e., ones that open up the ‘black box’ of what people actually do to make social programmes ‘work’ or not” (p. 168) – they argue that “interventions do or do not achieve what they set out to because of a combination of context, mechanism and outcomes (CMO). [RE] is concerned with finding what works for whom and in what circumstances and then extrapolating a detailed and evolving explanation to other contexts” (p. 168)
- Callaghan “adds […] on the idea of a mechanism, that what people are doing locally in their specific contexts to make social projects work is to negotiate order” (p. 168)
- Westhorp “recommends trying to identify the local ‘rules’ according to which people are operating as a way of offering richer evaluative explanations of what is going on” (p. 168)
- but Mowles suggests that this does not go far enough and that rather than opening the black box, realist evaluators “use a mystery to explain a mystery” (p. 168) and don’t seem able “to let go of the idea of a system with a boundary, outside which the evaluator stands, comprising abstract, interacting parts” (p. 169)
- he also suggests that the “persistence of systematic abstractions and predictive rationality may be that they protect the discipline of evaluation by separating the evaluator from the object to be evaluated” (p. 169) – not that evaluators are totally “unaware of the way that they influence social interventions” (p. 169), but that they “only go so far in developing how much these non-linear sciences apply to them and what they are doing in the practice of evaluation” (p. 170)

Mowles’ suggested alternative is “a radical interpretation of the complexity sciences, which understands human interaction as always complex and emergent” (p. 160)
he references Stacey (remember, the one who has moved on from the contingency approach) and colleagues and notes that “they argue that in moving form computer modelling [e.g., CAS] to theories of the social, but by preserving some of the insights by analogy, it might be helpful to think of social interaction as tending neither towards equilibrium nor as linear, nor as forming any kind of a whole. Social life always takes place locally between diverse individuals who have their own history and multiple understandings of what is happening as they engage and take up broader social themes” (p. 170)
he goes on to say that rather than thinking in terms of a system with a boundary, we think of “global patterns of human relating aris[ing[ from many, many local interactions, paradoxically informing and informed by […] the habitus. The habitus is habitual and repetitive, but because it is dynamically and paradoxically emerging it also plays out in surprising, novel and sometimes unwanted ways because of the interweaving of intentions.”
Mowles discusses the following implications for evaluation of his “radical” interpretation of complexity:
- evaluators cannot really just “decide” which social interventions (or parts thereof) are complex and which ones are not
- “calls into question the idea that emergence is a special category of social activity” (p. 171) – “social life is always emerging in one pattern or another, whether an intervention is tightly or loosely planned, and that people are always acting reasonably […] rather than rationally” (p. 171)
- “evaluation is a situated, contextual practice undertaken by particular people with specific life-histories interacting with specific others, who are equally socially formed. The evaluative relationship is an expression of power relations, both between the commissioner of the social intervention/evaluation and the evaluator, and between these and the people comprising the intervention, which will inform how the evaluation emerges” (p. 171)
- simplifying programs for the purposes of evaluation “cover[s] over the very improvisational and adaptive/responsive activity that makes social projects works, and even improve them, and which should be of interest both to commissioners and evaluators” (p. 171)
- “an evaluator convinced about complexity might […] take an interest in how their own practice forms, and is formed by the relationships they are caught up in with the people they are evaluating” (p. 171) – they’d be interested in:
  - “how people in the intervention negotiate order” (p. 171)
  - “how the evaluation itself is negotiated” (p. 171)
  - “how power relations play out in, and affect, the social intervention, including the framing of both the social development project as a logical project and the evaluation as a rational activity” (p. 171)
  - “pay[ing[ close attention to the quality of conversational life of social interventions, including how participants took up and understood any quantitative indicators that they might be using in the unfolding project” (p. 171)
  - “there will always be unintended and unwanted outcomes of social activity, which may be just as important as what is intended” (p. 171)
  - “how the programme changed oer time, and how people accounted for these changes: ‘progress’ in terms of the social intervention, could also be understood in the movement of people’s thinking and their sense of identity” (p. 171)
  - “evaluators should assume a greater humility in their work and their claims about predictability, causality and replicability” (p. 171)

References

Martin, C.M., & Sturmberg, J. P. (2009). Perturbing ongoing conversations about systems and complexity in health services and systems. Journal of Evaluation in Clinical Practice. 15: 549-552.

Mowles, C. (2014). Complex, but not quite complex enough: The turn to the complexity sciences in evaluation scholarship. Evaluation. 20(2): 160-75.

Stame, N. (2004). Theory-based Evaluation and Types of Complexity. Evaluation. 10(1): 58-76

Walton, M. (2014) Applying complexity theory: A review to inform evaluation design. Evaluation and Program Planning. 45: 119-126.

Posted in evaluation, notes | Tagged complexity, complexity sciences, evaluation, notes | Leave a comment

Complexity and Evaluation

Posted on March 22, 2016 by Beth

Notes from some readings on complexity and evaluation.

A Review of Three Recent Books on Complexity and Evaluation

Gerrits and Verweij (2015) reviewed three books that explored complexity and evaluation:

Forss et al’s Evaluating the Complex: Attribution, Contribution, and Beyond (2011)
Patton’s Developmental Evaluation: Applying Complexity Concepts to Enhance Innovation and Use (2011)
Wolf-Branigin’s Using Complexity Theory for Research and Program Evaluation (2013)

They note that all three of these books raise a similar concern (“that the complexity of social reality is often ignored, leading to misguided evaluation and policy recommendations, and that the current methodological toolbox is not particularly well-suited to deal with complexity” (p. 485)), but that they deal with this concern in different ways.

Forss et al

Patton

Wolf-Branigan

How they define complexity

“there is a difference between complexity as an experience and complexity as a precise quality of social processes and structures” (p. 485)

give multiple definitions

mention “a system state somewhere between order and chaos” and a focus on the non-linear and situated nature of complex systems” (p. 485)

“describes rather than defines complexity” (p. 485)
core principles of:
- non-linearity
- emergence
- adaptive behavior
- uncertainty
- dynamics
- co-evolution
“bolts on Holling’s adaptive cycle and panarchy” (p. 485)

“settles on Mitchell’s (2009) definition which focuses on the self-organizing aspect of complex systems, out of which collective behavior emerges” (p. 485)
“emergent behavior [..] is a process that is embedded in complex systems” (p. 485)
complex systems –> complex adaptive systems “when the constituent elements show mutual adaptation” (p. 485)

They note that Wolf-Branigan offers a “complexity-friendly set of evaluation methods” and that Forss et al, being an edited volume of chapters by different authors with a bunch of different ways that they dealt with complexity (and possibly some conflation of complexity and complicatedness, which suggests they perhaps did not have a clear understanding of complexity).

In contrast to a focus on methods, they noted that Patton views complexity as a “heuristic and sense-making device” (p. 487) and thus Developmental Evaluation is “an approach that […] favors:

dynamics over stability
uncertainty over certainty
equifinality and multi-finality over general laws, etc.” (p 487)

Developmental Evaluation “is a dynamic kind of evaluation that does not only seek to identify causal relationships and to serve accountability, but that also offers an approach that interacts with the programs it evaluates, preferably feeding results back into the program on the fly, so as to develop it” (Gerrits & Verweij, 2015, p. 486)

A few other points of interest:

“Whereas complicated interventions can be evaluated by asking “what works for whom in what contexts” […] in complex programs, ‘it is not possible to report on these in terms of “what works”… because what “it” is constantly changes'” (Gerrits & Verweij, 2015, p. 486)
When the “object of evaluation is complex (i.e., changes over time, etc.), it challenges the evaluation methods that do not account for that complexity” (Gerrits & Verweij, 2015, p. 488)
“Complexity features a language that is relatively foreign to evaluators and that is difficult to operationalize” (Gerrits & Verweij, 2015, p. 488)

A Paper on “Evaluating Complex and Unfolding Interventions in Real Time”

“simple interventions rely upon a single (a coherent set of) known mechanisms with a single (a coherent set of) output whose benefits are understood to lead to measurable and widely anticipated outcomes” – e.g., a drug to treat a disease
“complicated interventions involve a number of interrelated parts, all of which are required to function in a predictable way if the whole interventions is to success. the processes are broadly predictable and outputs arrive at outcomes in well-understood ways” – e.g., a rocketship is complicated – lots of interrelated parts, but it functions as expected (e.g., “it does not transform itself over time into a toaster”)
“complex interventions are characterized by:
- feedback loops
- adaptation and learning by both those delivering and those receiving the intervention
- portfolio of activies and desired outcomes which may be re-prioritized or changed
- sensitive to starting conditions
- outcomes tend to change, possibly significantly, over time
- have multiple components which may act independently and interdependently” (Ling, p. 80)
when delivering (or receiving) complex interventions, people:
- “learn and adapt
- reflexively seek to make sense of the systems in which they act and where possible to change how they work
- adapt behaviour based on a changing understanding of the consequences of their actions”
- of course, they (and the evaluators) only have an “incomplete understanding of these systems and their actions based on this limited understanding may be unpredictable” (p. 81)
RCTs can be used for simple and even complicated interventions, but are not appropriate for evaluating complex because they are “inherently unable to deal with complexity” (p. 80)
also, it is important to remember that “interventions interact with complex systems in ways that cannot be predicted. The evaluation challenge lies in understanding this interaction” (emphasis mine, p. 80)

“While we need to challenge the expectation that evaluations of the complex will lead to more precise preditions and greater control, we should not adandon the belief that appropriately structured evaluations can contribute positively to reflexivity while simultaneously fulfilling the evaluators’ mission to strengthen both learning and accountability.. To do so we will need ot trade our search for universal generalizability in favour of more modest, more contigent, claims. In evaluating complex interventions we should settle for constantly improving understanding and practice by focusing on reducing key uncertainties.” (p. 81)

problem with “more conventional approaches” to program evaluation when used in situations of complexity:
- expect to understand the whole by looking at a combination of its parts
- evaluations “therefore […try to…] build up detailed pieces of evidence into an accurate account of the costs (or efforts) and the consequences, […] add up all the inputs, describe the processes, list the outputs and (possibly) weight outcomes and put this together to form judgements about and draw evaluative conclusions” (p. 81)
- this can work for simple or complicated interventions “where we can make highly plausible assumptions that we know enough about both the intervention and the context” (p. 81)
for complexity, however, this is not the case:
- need to “start with an understanding of the systems within which the parts operate”
- “it is not simple the presence of [factors] (and the more the better), [but] rather it is how these parts are combined and balanced […] and how they are shaped to address local circumstances or resonate with national agendas. In other words, how they form a system of improvement and how this systems interacts with other systems in and around healthcare services. From an evaluator’s point of view, ‘What matters is making sense of what is relevant, i.e., how a particular intervention works in the dynamics of particular settings and contexts.'” (emphasis mine, p. 81-2)
“conceptualizing complex interventions is made more difficult still by the fact that we rarely find an intervention that can adequately be described as a single system. More often there are systems nested within systems.” (p. 82)
- e.g., systems “operating individual, organization, and whole-system levels (or micro, meso, and macro)” (p. 82)
- “when we talk about an intervention being context-dependent, or context-rich, we are describing how the processes and outcomes in each case are shaped by the particular ways in which these systems and subsystems uniquely interact” (p. 82)
“most economic evaluations are still primarily quantitative evaluations of “black box” interventions – that is, with little or no explicit interest in how and why they generate different effects or place different demands on the use of resources” (p. 83)
we need to recognize that the context in which an intervention is conducted is important, but “this approach to contextualization could lead to the conclusion that every context is different and unique and so we cannot use the lessons from one evaluation to inform decisions elsewhere […] To address this challenge, we can use complexity thinking to go beyond simply arguing that each context is different by showing how particular system function and how systems interact. If this were successful it would provide a way of contextualizing and then allowing ‘mid-range generalization’. This could deliver sufficiently thick description of the workings of systems and subsystems to support reflexive learning within the intervention and more informed decision making elsewhere. It establishes mid-ground between the uniqueness of everything and universal generalizability.” (emphasis mine, p. 83-4)
“evaluations should more often be conducted in real time and support reflexive learning and informed adaptation. Rather than seeing an intervention as a fixed sequence of activities, organized in linear form, capable of being duplicated and repeated, we see an intervention as including a process of reflection and adaptation as the characteristics of the complex system become more apparent to practitioners. The evaluation aims in real time to understand these and support more informed adaptation by practitioners. It also provides an account of if and how effectively practitioners have adapted their activities in the light of intended goals. They can be held to account for their intelligent adaptation rather than slavishly adhering to a set of instructions. Furthermore, the evaluation should say something about how the approach might be applied elsewhere.” (emphasis mine, p. 84-5)
Ling cites Stirling’s Uncertainty Matrix as a useful way to think about the “different kinds and causes of uncertainty” (p. 85)

Uncertainty Matrix (adapted from Stirling, 2010)

probabilities – i.e., the chance of something happening
possibilities – i.e., the range of things that can happen
our knowledge of probabilities and possibilities can each be either non-problematic (i.e., we know the chance of something happening and we know the range of things that can happen, respectively) or problematic
risk – when we know the range of possibilities and each of their probabilities – we can engage in risk assessments, expert consensus, optimizing models
uncertainty – limited number of possibilities but we don’t know the probabilities of them occurring – we can use scenarios, sensitivity testing, etc.
ignorance – both range of possibilities sand their probabilities not known – we need to monitor, be flexible and adaptive
ambiguity – range of possibilities is problematic, but probabilities not problematic – we can use participatory deliberation, multicriteria mapping, etc.

for simple interventions, evaluations aim for certainty
for complicated interventions, evaluations aim to reduce (known) uncertainty
for complex interventions, evaluations aim to support a self-improving system
- first aim to expose uncertainties, then to reduce them
- “need to understand both activities and contexts, important to identify how learning and feed back happens, understand both system dynamics but also what makes change ‘sticky’, real-time evaluation necessary, requires a counterfactual space or matrix” (p. 86)
Ling recommends “an evaluation approach based […on] understanding the unfolding ‘Contribution Stories’ that those involved in delivering and adapting interventions work with to describe their activities and anticipated events” (p. 86-7)
- Contribution Stories “aim to surface and outline how those involved in the intervention understand the causal pathways connecting the intervention to intended outcomes” and “provide an opportunity to explore their thinking about how the different aspects of the intervention interact with each other and with other systems” (p. 87)
- from the Contribution Stories, “more abstract Theories of Change can be developed which trace the causal pathway linking resources use to outcomes achieved. Theses Theories of Change will be contingent and context-dependent and should be expressed as ‘mid-range theories’; not so specific that they amount to nothing more than a listing of micro-level descriptions of the causal pathway of the specific intervention but also not so abstract that it cannot be tested or informed by the evidence from the evaluation.” (emphasis mine, p. 87)
- next, evaluators: (1) “identify key uncertainties associated with the intervention – those anticipated causal linkages for which there is limited evidence or inherent ambiguities or ignorance.” and (2) Data collection & analysis would then aim to reduce these uncertainties, hopefully producing evidence that would be both relevant and timely.” (p. 87)
6 stages (at which evaluators should “reflect on the consequences of complexity” (p. 87)
1. Understand the interventions Theory of Change and its related uncertainties
  - include “importance of learning and adaptive” (p. 87)
  - “identify key dependencies upon systems and subsystems which lie outside the formal structures of the intervention” (p. 87)
2. Collect and analyse data focused on key uncertainties
  - “identify where key uncertainties exist”
  - identify “what sort of uncertainty it is” (ignorance, risk, ambiguity, uncertainty)
  - “data collection alone may not address all of the key uncertainties” (p. 87)
3. Identify how reflexive learning takes place through the project and plan data collection and analysis to support this, strengthening the formative role of evaluation
  - there is a “creation of evidence by the project itself as it learns and adapts”
  - the “evaluation can support this learning as part of a formative role at the same time as building a data base for its own summative evaluation” (p. 87) (with a shift it the balance towards a more formative role) [this sounds like what I’ll be doing with my project]
4. Building a portfolio of activities and costs
  - “identifying boundaries around the cost base is made difficult when the success of a project may depend more on harnessing synergies from outside the intervention itself.” (p. 88)
  - “a major cost in conditions of complexity is equipping projects to be adaptable and responsive to a changing environment. Essentially, part of what is being ‘bought’ is flexibility and, by definition, this means that some resources might not need to be used. It could be regarded as the cost of uncertainty” (p. 88)
5. Understanding what would have happened in the absence of the intervention
  - it is “often much harder to identify the counterfactual” for a complex intervention than for simple/complicated ones, but it is still “crucial to pose the core question in an evaluation which is ‘did it make a difference?'” which of course requires us to ask “compared to what?
  - rather than the counterfactual being a single thing, think of it more as “a counterfactual space of more or less likely alternative states. This might be produced by scenarios, modelling, simulation, or even expert judgement depending upon the nature of the uncertainty” (p. 88)
6. “The evaluation judgment should not aim to identify attribution (what proportion of the outcome was produced by the intervention?) but rather to clarify contribution (how reasonable is it to believe that the intervention contributes to the intended goals effectively and might there be better ways of doing this?)” (p. 88)
the above is a general outline – still needs to be fleshed out
important to remember that “interventions change as they unfold” and “this adaptation is both necessary and unpredictable” (p. 89)

A Few Points from the Stirling Paper

I looked up the Stirling paper that Ling had cited to read more about the uncertainty matrix. This paper made the point that “when knowledge is uncertain, experts should avoid pressures to simplify their advice. Render decision-makers accountable for decisions.” (p. 1029).
Also: “An overly narrow focus on risk is an inadequate response to incomplete knowledge.” (p. 1029)

A Paper on “Using Programme Theory to Evaluate Complicated and Complex Aspects of Interventions”

It’s not about “creating messier logic models with everything connected to everything. Indeed, the art of dealing with the complicated and complex real world lies in knowing when to simplify and when, and how, to complicate” (p. 30)
various names for “program theory”:
- programme logic
- theory-based evaluation
- theory of change
- theory-driven evaluation
- theory-of-action
- intervention logic
- impact pathway analysis
- programme theory-driven evaluation science
  they all refer to “a variety of ways of developing a causal modal linking programme inputs and activities to a chain of intended or observed outcomes, and then using this model to guide the evaluation” (p. 30)
Glouberman and Zimmerman’s (2002) analogy re: complexity:
- simple = following a recipe (very predictable)
- complicated = sending a rocket ship to the moon (need a lot of expertise, but there is high certainty about the outcome; doing it once increases your likelihood of doing it again with the same result)
- complex = raising a child (every child is unique and needs to be understood as such; what works well with one child will not necessarily work well with another; uncertainty about outcome)
Rogers suggests using this distinction to think about different aspects of an intervention (as some aspects of an intervention could be simple, while others are complicated or complex)
simple linear logic models:
- (inputs –> activities –> outputs –> outcomes –> impact):
- lack information about other things that can affect program outcomes, such as “implementation context, concurrent programmes and the characteristics of clients” (p.34)
- risk overstating the causal contribution of the intervention” (p. 34)
- best to reserve simple logic models for “aspects of interventions that are in fact tightly controlled, well-understood and homogeneous or for situations where only an overall orientation about the causal intent of the intervention is required, and they are clearly understood to be heuristic simplifications and not accurate models” (p. 35)
complicated logic models:
- multi-site, multi-governance – can be challenging to get multiple groups to agree on evaluation questions/plans, but if there is a clear understanding of the “causal pathway” (e.g., a parasite causes a known problem, program is working to reduce the spread of that parasite), you can use a single logic model, report data separately for each site and in aggregate for the whole
- simultaneous causal strands – all of which are required for the program to work (“not optional alternatives but each essential” (p. 37); must show them in the logic model (and indicate they are all required) and collect data on them
- alternative causal strands – where the “programme can work through one or the other of the causal pathways” (p. 37); often, different “causal strands are effective in particular contexts”; difficult to denote visually on a logic model
  - can conducted “evaluation that involve ‘comparative analysis over time of carefully selected instances of similar policy initiatives implemented in different contextual circumstances’ ” (Sanderson, 2000 cited in Rogers, 2008, p. 37)
  - it’s important to document the alternative causal strands in an “evaluation to guide appropriate replication into other locations and times” (p. 38)
complex logic models:
- two aspects of complexity that Rogers talks about as having been addressed in published evaluations:
  - recursive causality & tipping points – rather than program logic being a simple “linear progression from initial outcomes to subsequent outcomes” (p. 38), the links are “likely to be recursive rather than unidirectional” and have “feedback mechanisms [and] interactive configurations” – it’s “mutual, multidirectional, and multilateral” (Patton, 1997 cited in Rogers, 2008, p. 38)”
  - “many interventions depend on activating a ‘virtuous circle’ where an initial success creates the conditions for further success,” so, “evaluation needs to get early evidence of these small changes, and track changes throughout implementation” (p. 38)
  - ‘tipping points’ – “where a small additional effort can have a disproportionately large effect, can be created through virtuous circles, or a result of achieving certain critical levels” (p. 38)
  - can be hard to show virtuous circles/tipping points on logic model diagrams, so may require notes on diagrams [I wonder if we can do anything with technology to better illustrate this?]
  - emergence of outcomes
    - what outcomes there will be, and how they will be achieved, “emerge during implementation of an intervention”
    - this may be appropriate :
      - “when dealing with a ‘wicked problem’
      - where partnerships and network governance are involved, so activities and specific objectives emerge through negotiation and through developing and using opportunities
      - where the focus is on building community capacity, leadership, etc., which can then be used for various specific purposes” (p. 39)
    - could develop a “series of logic models […] alongside development of the intervention, reflecting changes in understanding. Data collection, then, must be similarly flexible.” (P. 39)
      - may have a clear idea of the overall goals, but “specific activities and causal pathways are expected to evolve during implementation, to take advantage of emerging opportunities and to learn from difficulties” (p. 40) – so could develop an initial model that is “both used to guide planning and implementation, but [is] also revised as plans change” (p. 40) [this is what we are doing on my current project]
interventions that have both complicated and complex aspects
- e.g., multi-level/multi-site (complicated) and emergent outcomes (complex)
- could have a logic model that “provide[s] a common framework that can accommodate local adaptation and change” (p. 40)
- “a different approach is not to present a causal model at all, but to articulate the common principles or rules that will be used to guide emergent and responsive strategy and action” (p. 42-3)
how to use program theory/logic models for complicated & complex program models
- with simple logic models, we use program theory/logic models to create performance measures that we use to monitor program implementation and make improvements
- with complicated & complex models, we cannot do this so formulaically
- one of the importance uses of program theory/logic models in these situations is in having “discussions based around the logic models” (p. 44)
- evaluation methods tend to be more “qualitative , communicative, iterative, an participative” (p. 44)
- “the use of ’emergent evaluation’ – engaging stakeholders in “highly participative” processes that “recognize difference instead of seeking consensus that might reflect power differences rather than agreement” (p. 44) – and then these “multi-stakeholder dialogues [are] used simultaneously in the roles of data collection, hypothesis testing and intervention, rather than evaluators going away with the model and returning at the end with result” (p. 44) – and stakeholders can then “start to use the emerging program theories […] to guide planning, management and evaluation of their specific activities.” (p. 44)
- having “participatory monitoring and evaluation to build better understanding and better implementation of the intervention” (p. 45)
- citing Douthwaith et al, 2003: “Self-evaluation, and the learning it engenders, is necessary for successful project management in complex environments” (p. 45)
final thoughts:
- “The anxiety provoked by uncertainty and ambiguity can lead managers and evaluators to seek the reassurance of a simple logic model, even when this is not appropriate[, but…] a better way to contain this anxiety might be to identify instead the particular elements of complication or complexity that need to be addressed, and to address them in ways that are useful” (p. 45)

I have a lot more articles to read on this topic, but this blog posting is getting very long, so I’m going to publish this now and start a new posting for more notes from other papers.

References

Gerrits, L. & Verweij, S. (2015). Taking stock of complexity in evaluation: A discussion of three recent publications. Evaluation. 21(4): 481-91.

Ling, T. (2012). Evaluating complex and unfolding interventions in real time. Evaluation. 18(1): 79-91.

Rogers, P. (2008). Using Programme Theory to Evaluate Complicated and Complex Aspects of Interventions. Evaluation 14(1): 29-48.

Stirling, A. (2010). Keep it complex. Nature. 468. p. 1029-1031.

Posted in evaluation, notes | Tagged complexity, developmental evaluation, evaluation, notes | Leave a comment

Complex Adaptive Systems

Posted on March 15, 2016 by Beth

We often hear that “The health care field is complex, perhaps the most complex of any area of the economy (Morrison, 200, cited in Begun et al, 2003). And yet despite the number of times I’ve heard this, I often see it treated as if it were not complex and/or am not convinced that we are talking about the same thing when we use the word “complexity”. As a refresher on what “complexity” means, I did a bit of reading on complexity science. Here are my notes:

Often the metaphor of a machine is used when people think about organizations – where workers are like cogs in the machine, everything operates based on simple linear cause and effect, and management’s job is to control (e.g., make sure the cogs are in place and do their job as a cog in the machine); or where the organization receives inputs, transforms them, produce outputs (in the case of healthcare, the output might be “improved health”)

healthcare organizations ¹and, truthfully, probably any organization are much more complex than this machine metaphor suggests
but it’s important to remember that the models people use will “shape the way [they] believe the system works, and hence, constrain the possible ways people thing” (Begun et al, 2003, p. 253)
complex adaptive system (CAS) = “a collection of individual agents who have the freedom to act in way that are not always totally predictable, and whose actions are interconnected such that one agent’s actions change the context for other agents (Plsek, 2003, cited on Sibthorpe et al, 2004, p. 2.)
- complex: “diversity – a wide variety of elements”
- adaptive: “capacity to alter or change […] to learn from experience”
- system: “a set of connected or interdependent things” (Begun et al, 2004, p. 255)
“CAS defined in terms of:
- component parts
- behaviour of those parts
- relationships between the parts
- behaviours (or properties) of the whole” (Sibthorpe et al, 2004, p. 2)
agents are “information processors” – “they can process information and adjust their behaviour accordingly” (Sibthorpe et al, 2004, p. 2). All agents have some “information about the system but none understands it in its entirety” (Sibthorpe et al, 2004, p. 2)
- in a health care system, agents can include:
  - people (e.g., clinicians, patients, administrations)
  - processes (e.g., nursing processes, medical processes)
  - functional units (e.g., nursing, communications, accounting)
  - small organizations (e.g., a medical practice)
  - large organizations (e.g., hospitals, insurance companies)
agents are connected (and share information) through a web of relationships
- can be described as “massively entangled” as the “parts of the system and the variables describing those parts are large in number and interrelated in complex ways” (Sibthorpe et al, 2004, p. 2)
- agents “both alter other agents and are altered by other agents, in their interactions” (Begun et al, 2003, p. 256)
- “the diversity, extent, intricacy, and strength of the relationships influence the system’s ability to adapt” (Sibthorpe et al, 2004, p. 2)
agents respond to their environment using “simple rules” – which “need not be shared, explicit, or even logical”, but they “contribute to patterns and bring coherence to behaviours in complex systems” (Sibthorpe et al, 2004, p. 2)
CAS-defining properties:
- dynamic: “the continual presence of multiple interactions and their accompanying surprises, challenges and responses both within the system and between the system and its environment” (Miller et al; cited in Sibthorpe et al, 2004, p. 3)
  - change is “discontinuous” – “periods of stability and periods of change”, with change occurring at different times/paces
- self-organizing and emergent: “new structures and forms of behaviour emerge that cannot be obtained by summing the behaviours of the constituent parts, because new system properties emerge from the nonlinear interactions between agents” (Sibthorpe et al, 2004, p. 3) – you cannot control or predict what will happen
  - “the behaviour of the resulting whole is more than the sum of individual behaviours” (Begun et al, 2003, p. 256)
  - “one agent’s actions change the environment for other agents […and…] and surprising and innovative ideas can emerge from unpredictable corners of a complex system. ” (Sibthorpe et al, 2004, p. 3
- CAS can be sensitive to initial conditions – “an apparently trivial different in the beginning state of the system may result in enormously different outcomes” (the “butterfly effect”)
- CASs are “embedded within and bounded by other CAS with which they co-evolve” (Sibthorpe et al, 2004, p. 3) – they change and they cause the world around them to change too
- CAS operate at multiple levels/scales
- they have “fuzzy” boundaries
- the above properties are “dependent on feedback loops – the movement of information between agents and between systems” (Sibthorpe et al, 2004, p. 3) – these loops can generate change or stability as they “fuel the interdependence of the system by keeping the parts synchronised, and simultaneously support evolution of the system by providing impetus and resources for adaptation” (Sibthorpe et al, 2004, p. 3)

Complexity Science	Established Science
Holism	Reductionism
Indeterminism	Determinism
Relationships among entities	Discrete entities
Nonlinear relationships – critical mass thresholds	Linear relationships – marginal increases
Quantum physics – influence through iterative nonlinear feedback – expect novel and probabilistic world	Newtonian physics – influence as direct result of force from one object to another – expect predictable world
Understanding; sensitivity analysis	Prediction
Focus on variation	Focus on averages
Local control	Global control
Behavior emerges from bottom up	Behavior specified from top down
Metaphor of morphogenesis	Metaphor of assembly

Source: In Begun et al, 2004, p. 260 – adapted from Dent, 1999, Table 1.

“Traditional systems thinking has created a vicious cycle of (1) design a system, and (2) when the system does not act as predicted, redesign the system. The assumption is that leaders can control the evolution of complex systems by intentions and clear thinking. Complexity science leads one to ask different questions. For example, when an intended design does not play out as predicted, how do things continue to function?” (p. 286-87)

References:

Begun, J. W., Zimmerman, B., & Dooley, K. (2003). Health care organizations as complex adaptive systems. In Advances in Health Care Organization Theory. Eds. S.M. Mick & M. Wyttenbach. San Francisco: Jossey-Bass, pp. 253-288.

Sibthorpe, B, Glasgow, N., & Longstaff, D. (2004). Complex Adaptive Systems: A Different Way of Thinking about Health Care Systems A Brief Synopsis Of Selected Literature For Initial Work Program – Stream 1. Canberra. Australian Primary Healthcare Research Institute, Australian National University. http://betterevaluation.org/sites/default/files/Sibthorpe%20et%20al%202004%20CAS%20-%20%20%20A%20different%20way%20of%20thinking%20about%20health%20care%20systems.pdf

Footnotes[+]

Footnotes
↑1	and, truthfully, probably any organization

Posted in evaluation, healthcare, notes | Tagged complex adaptive systems, complexity, complexity science, evaluation, notes, systems | Leave a comment

Agile

Posted on March 8, 2016 by Beth

On Feb 24 & 25, I received training in Scaled Agile Framework for enterprise (SAFe®) – a framework that combines elements of Agile software development, Lean process improvement, and Extreme Programming (XP) . The project I’m working on is looking to incorporate elements of this and part of that involves getting us all trained. Here are my notes of highlights (which I wrote up as a way to prepare for the test which I then took to get “certified” in SAFe®, as well as to process some thoughts on how I can apply what I’ve learned to my work).

Core Values of the framework:
- code quality – built in upfront
- program execution – produce something that provides value
- alignment – helps you make decisions in context, promotes decentralized decision-making
- transparency – everyone at all levels needs to have the same information; transparency is needed to built trust
The framework is modular – you can use just the pieces of it you need, but the elements are synergistic, so the more you use, the more “benefit” you get
They also noted that “projects” create “deliverables” and “outputs”, whereas “programs” provide “outcomes” and “benefits”. This resonated for me as I’ve worked with the “project” to focus on ultimate goals (i.e., clinical transformation) rather than shorter-term outputs (such as “creating an IT system”. We want an IT system because it will help us reach our clinical goals, not just because we want an IT system).
They don’t like to think in terms of “projects”, which are defined by a start and end, as Agile is more about ongoing development (rather than just creating a complete/perfect product that you roll out and then disband the project team). Instead, they think of things are “programs”. This has some implications for the “project” I’m working, which has been defined as a “project” that will essentially produce a product, roll it out to the organizations, and then transition it over to operations at the orgs. This way of thinking provides more of a blurring the project into the organizations (rather than being separate from it), which actually mirrors a shift the project has been undergoing anyway (from an IT-focused, vendor-led project to a clinically-focused, IT-enabled, health organization-led, “project” team-supported program)
This whole thing is a paradigm shift from what we are used to – it requires a change in mindset and culture – both of which are difficult things to do.
Team decides “how” it gets done (neither product owner nor leadership is allowed to interfere with the “how” it gets done.
Product owner decides “what” gets done (giving the customer what it decides is important)
Agile teams are:
- self-organizing
- empowered (because they get to make decisions and be accountable and regularly produce something of value – this is motivating to people)
- cross-functional
- 4-9 people
- includes:
  - developers/testers – creates/refines user stories & acceptance criteria (that the P.O. will make the decision on whether to accept); builds/tests/delivers stories
  - product owner – represents the interests of the customer; prioritizes the backlog
  - demo master – mentors the team, removes barriers that are getting in the team’s way, attends scrum of scrum meetibng
Scaled Agile involves scaling to the Program and to the Portfolio
Scaled to the Program:
- self-organizing self-managing team of Agile teams
- a common mission
- a single backlog of items to be done (which they can all draw on)
Program Increment (PI)
- a specific amount of time in which they plan/implement/measure
- provides a cadence which all the teams align to
- not the same as the product release cadence (product releases are “on demand” – release the stuff that’s been built out to the world when it makes sense).
Scaled to the Portfolio:
- a Portfolio vision to provide an aim for the whole system
- Lean approaches to strategy, finance, program management, and governance are all needed to support the Agile program and Agile teams
- strategy is centralized, but execution is decentralized
Backlog:
- a list of things to do
  - if something is on there, it may get done
  - if something is not on there, it definitely will not get started
  - don’t need details (which require a lot of time to define and might change by the time the items gets from the backlog to actually being worked on – so details are defined just-in-time)
  - product owner should keep it prioritized (look at it every day) – e.g., if a team finds themselves with extra capacity, they may take an item from the top of the backlog to do
  - includes:
    - user stories – a thin slice of functionality – must be something of value
    - refactors – an improvement to the code base (e.g., improving performance, maintainability, scalability) that does not change any observable system behaviour; making your code “beautiful”
    - spikes – research activities (e.g., to understand a functional need, to reduce a risk, increase estimate reliability, define a technical approach; to build a prototype in order to learn stuff)
      - technical spike: researching a technical approach or unknown
      - functional spike: researching how a user might use/interact with the system
- Portfolio Backlog contains Epics
- Program Backlog contains Features
- Team Backlog contains Stories

Lean is a process improvement method that focuses on eliminating waste and doing things that provide value
The House of Lean:

value

respect for
people and
culture flow innovation relentless improvement

leadership
focused on providing value
people do the work – respect and trust them
the customer = whoever “consumes” your work – respect them too
to change culture, you need to change the organization
Remember the Peter Drucker quotation: “Culture eats strategy for breakfast”
want to optimize flow, avoid start-stop-start delays, make informed decision by using fast feedback
- reducing batch size:
  - helps reduce variability, which improves flow
  - results in a quicker cycle time, so you get feedback faster
  - limiting Work in Progress (WIP) also helps improve flow
    - put limits on amount of WIP that is allowed
    - having a short cycle helps reduce WIP (focus on getting small things done and thus out the “WIP” stage)
    - when WIP is too high, purge the lower value items
    - make the WIP visible – when you can see it, it is easier to see when it is building up
need time and space to innovate; try getting out of the office (go to the Gemba – Gemba = “the real place; the place where the work is actually done)
focus on relentless improvement – applying Lean tools to identify and address root causes of issues; reflect at key milestones
leadership supports the team by leading the change, developing people
“People are already doing their best; the problems are with the system. Only management can change the system” – W. Edwards Deming

Kanban = visual board
have it up in the team’s work area – look at it every day – serves as an early warming indicator
- the earlier you identify an issue, the more options you have to fix it

Decisions: to centralize or decentralize?
- centralize: infrequent, long-lasting, significant economies of scale
- decentralize: frequent, time critical, don’t have significant economies of scale
- local decision making has the benefit of better local information
- framework for making the decision about where decisions should be made (where 0-3 = centralize, 4-6 = decentralize):

Decision	Frequent? Y=2 N=0	Time-critical Y=2 N=0	Economies of scale Y=0 N=2	Total	Decision should be:
Should we… ?	0	2	0	2	centralized
Should we… ?	2	2	0	4	decentralized

Traditional Waterfall vs. Agile:

Waterfall:

Requirements

[documents]

Design

[documents]

Implementation

[unverified system]

Verification

[system]

Agile:

sprint

[working code]

sprint

[working code]

sprint

[working code]

sprint

[working code]

sprint

[working code]

sprint

[working code]

sprint

[working code]

sprint

[working code]

sprint: short iterations with a regular cadence (usually 2 weeks long) in which you produce a thing of value
- plan, develop/build/test/integrate, and demo for the product owner
- retrospective the end to reflect on what you learned, what you can do better next time
Sprint planning: 4 hours
- product owner presents sprint goals (which align team to a common purpose) & stories
- determine team velocity
- team clarifies and estimate stories (elaborates acceptance criteria)
- P.O. and team negotiate and finalize (load = stories you taken on for the sprint – must be less than or equal to the velocity)
- everyone commits to the sprint
put all the stories and their acceptance criteria on the kanban

Scrum
- daily, 15-minute stand up meetings
- share info on progress, coordinate activities, raise issues that are blocking you (scrum master’s job is to get rid of those blocks for you)
- ask 3 questions:
  - what stories did you work on yesterday?
  - what stories can you complete today?
  - what’s blocking you?
- don’t get into problem solving – can od that in a “Meet After”
Mid-Sprint Review
- determine if you are on course and if you need to adjust
- if you aren’t on track to meet your sprint goal, you can:
  - negotiate story scope to “good enough”
  - reprioritizee stories
  - reassign resources
  - defer or delete stories
Backlog Refinement Session
- done to prepare for next sprint
- can invite subject matter experts (SMEs) or other team’s members if needed
- process
  - P.O. present potential stories for next sprint
  - discuss this list (e.g., add or remove stories, split stories, ask questions, discuss acceptance criteria)
  - determine if anything needs to be done on these before the next sprint
Demo
- demonstrate what you created for each story
Sprint Retrospective Meeting
- reflect on results
- learn from reflection
- adapt process to produce better results
- pick one thing you can do differently next time

Since you are delivering something the provides value every two weeks, if the program gets cancelled, you’ve provides something useful that is out there in the world. When a traditional project gets cancelled, you’ve spent millions with nothing to show for it but a bunch of documents – nothing of value has been produced

Agile Manifesto (quoted verbatim – source):

We are uncovering better ways of developing
software by doing it and helping others do it.
Through this work we have come to value:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

That is, while there is value in the items on
the right, we value the items on the left more.

This struck me as having similarities to Developmental Evaluation – with the focus on responding to changes and collaborating with your client. And instead of “working software” over documentation, I’d say “evaluation findings put to use” over documentation of evaluation findings.

timelines and budget are fixed – what can change is the scope
it’s better to have a high quality product of a smaller scope than a poor quality product that covers a bigger scope (especially since you know that you are going to continue to add to the scope over time)

Extreme Programming (XP)
- testing-driven development (and automated testing)
- pair programming
- simple design
- coding standards
- continuous integration
- collective ownership
- user stories
- refactoring

System Team
- builds infrastructure
- manages environments
- provides/supports full system integration
- end-to-end system and performance testing
- stages and supports the System Sprint Demo

Estimating backlog items
- don’t use hours – use “story points”, which is a single number that represents:
  - volume – how much?
  - complexity – how hard?
  - knowledge – what do we know?
  - uncertainty – what do we not know?
- story points are relative to other stories (e.g., a 2 pt story takes half as long as a 4 pt story), but not in a specific unit of measure
- then we use this size to estimate a duration
- estimation is a team activity – applies all perspectives, develops shared understanding & commitment (everyone participates in the estimation session, but the product owner does not estimate)
- uses a modification of the Fibonacci sequence: 1, 2, 3, 5, 8, 20, 40, 100, ?
- process:
  - product owner reads the backlog item
  - everyone discusses
  - estimators select a story point value (have cards with all the numbers of the modified Fibonacci sequence)
  - everyone show their cards
  - discuss highest and lowest values (and why they are at extremes)
  - re-estimate (continue until values converge)
- velocity = amount of capacity the team has during a sprint, expressed in story points
- process for determining team velocity:
  - 8 points per full-time developer or tester (pro-rate for part-timers)
  - -1 point for every holiday or vacation day
- quick way to start estimating – find a story that will take ~1/2 day to code + ~1/2 to test/validate and call that 1 point – estimate everything else relative to that
- this method is quick, easy, and just as accurate as complicated mathematical estimate models – it also gets the group talking about how to solve the problem

User Stories:

As a <role>
I can <activity>
So that <business value>

a good user story should be small and add value
- INVEST – Independent, Negotiable, Valuable, Estimable, Small, Testable
if the user story is too big:
- if it’s a complex problem, break it down into technical and/or functional spikes (to reduce the complexity)
- if it’s a compound problem, split it into smaller stories
- can split stories by:
  - workflow steps – make each step a story
  - business rule variations – make each variation a story (e.g., “sort by demographics” can be split into “sort by postal code”, “sort by home demographics”, etc.)
  - major effort – start with the one requiring most effort and add more functionality later
  - simple/complex – what’s the simplest version that can work?
  - variations in data – e.g., send messages in English, in French, in Spanish, etc.
  - data methods – e.g., bar chart of your household’s data, then bar chart that compares your data with other households’ data
  - deferring system qualities – start with simple and later more it faster, or more precise or more scaleable
  - operations – split each operation into its own story – e.g., Create Read Update Delete (CRUD)
  - use case scenarios – if use cases represent complex interaction, split into individual scenarios
roles:
- can be people, devices, or systems
- 1st degree – people that use the product (meet their needs first)
- 2nd degree – people that work with results from those who use the product
- 3rd degree – people that sell, install, support, or make money from the product
sequence stories based on:
- priorities, events (e.g., milestones, release dates), dependencies with other teams, local priorities, capacity allocations for defects/maintenance/refactors
allocate some time for refactors and maintenance, along with user stories
non-functional requirements (NFRs) – system qualities (the “-ities” – reliability, usability, scalability, maintainability, etc.)

Acceptance Criteria
- how you will known you have achieved the intent of the story
- starting point for story acceptance tests
- include both system behaviour and NFRs
- defined during sprint planning (but can also come up during backlog refinement (i.e., before spring planning) or during design/build/test (i.e., after sprint planning))
Acceptance tests are written to test that the system is working as intended; they will be used repeatedly because you need to make sure that they system keeps working as intended when you change things

Definition of Done (DoD) – owned by the whole team
an example of a DoD:
- acceptance criteria met
- unit test passed, included on kanban
- coding standards followed
- code peer reviewed
- code checked in
- story acceptance test passed
- no must-fix defects
- NFRs met
- story accepted by product owner

Agile Release Train (ART)
- a virtual org of 5-12 teams (50-125 people) (a team of agile teams)
- self-organizing, self-managing
- plans/commits/executes together
- common mission
- common cadence
- synchronized – system is sprinting together
- normalized story point estimating
- single program backlog
- produces valuable Program Increments every 8-12 weeks
- all “cargo” (cod, docs, supplemental) goes on the train
- the system always runs
- but since it’s so frequent, if you don’t get on this train, you can just get on the next one

Scrum of Scrums
- meeting of all the scrum masters + the Release Train Engineer
- twice per week
- like a scrum, it’s focused on team progress and program impediments (and can be supplemented with a Meeting After for problem solving)

Release Planning
- two day session held every Program Increment (8-12 weeks)
- everyone attends
- develop a common set of program objectives for the next PI

Image credit:

Kanban board – posted by Kanban Tool on Flickr with a Creative Commons license

Posted in event notes, information technology, notes | Tagged agile, Extreme Programming, SAFe, Scaled Agile for Enterprise | Leave a comment

Context in Evaluation

Posted on February 17, 2016 by Beth

I’ve been thinking about how I’m going to address the many layers of “context” in the evaluation that I’m currently working on and found a few articles that I found quite useful. Here are my notes from reading those articles.

We often talk about the importance of context in evaluation, “yet the various contextual factors that influence evaluation are rarely considered in much depth in the evaluation literature” (Fitzpatrick, 2012, p. 8).
“There is not a unified understanding of context or a comprehensive theory that guides our work” (Rog, 2012, p. 26) – and context and culture are often used interchangeably

To:	Context Is…
quantitative evaluators	… a “source of influence to be controlled”
realist & theory-oriented evaluators	…a “source of explanation”
qualitative evaluators	…an “inseparable element embedded in program experiences and outcomes” and “there is not a unified understanding of context or a comprehensive theory that guides our work” (Rog, 2012, p. 26)

Definitions of context

There isn’t a single, universally accepted definition of context. Some definitions include:

Context: “the circumstances that form the setting for an event, statement, or idea, and in terms of which it can fully understood or assessed” (Oxford England Language Dictionary (OELD), cited in Fitzpatrick, 2012)
Context (specific to evaluation:) “the setting within which the evaluand […] and thus the evaluation are situated. Context is the site, location, environment, or milieu for a given evaluand.” (Greene, cited in Fitzpatrick, 2012) – most evaluands are situated in multiple contexts that have several layers dimensions (Rog, 2012, p. 27)
- 5 dimensions of context:
  1. “demographic characteristics of the setting and people in it
  2. material and economic features
  3. institutional and organizational climate
  4. interpersonal dimensions or typical means of interaction and norms for relationships in the setting
  5. political dynamics” (Greene, cited in Fitzpatrick, 2012)
- Context (again, specific to evaluation): “the combination of factors (including culture) accompanying the implementation and evaluation of a project that might influence its results, including geographical location, timing, political and social climate, economic conditions, and other things going on at the same time as the project. It includes the totality of the environment in which the project takes place.” (Thomas , cited in Fitzpatrick, 2012. Emphasis mine)
“Out of context” = without the surrounding words or circumstances and so not fully understandable” (OELD, cited in Fitzpatrick, 2012. Emphasis mine.)
Patton discusses context as a “sensitizing concept” (i.e., something that, rather thinking needing to be “operationalized”, is used to “provide some initial direction to a study as one inquires into how the concept is given meaning in a particular place or set of circumstances” (Patton, 2007, p. 102)
- “Systems thinkers posited that system boundaries are inherently arbitrary, so defining what is within the immediate scope of an evaluation versus what is within its surrounding context is inevitably arbitrary, but the distinction is still useful. Indeed, being intentional about deciding what is in the immediate realm of action of an evaluation and what is in the enveloping context can be an illuminating exercise—and stakeholders might well differ in their perspective” (Patton, 2007, p. 102)

Why Consider Context in Evaluation?

to increase evaluation use
to give voice to local issues
to explain program effects – to “identify those contextual elements that prompt a program to succeed or fail” (Fitzpatrick, 2012, p. 13)
context “affects the implementation and outcomes of the interventions that we study” (Rog, 2012, p. 25)
context can help us to choose an appropriate evaluation approach ( a “context-first approach”, as opposed to the “methods-first” orientation [i.e., the old “when you have a hammer, every problem looks like a nail])
“Much like the question we strive to answer in our evaluations, “What works best for whom under what conditions?” context-sensitive evaluation practice asks “what evaluation approach provides the highest quality and most actionable evidence in which contexts?” (Mark, 2001, cited in Rog, 2012, p. 26)
Context can help “provide more direction in replication and generalizability of findings” (Rog, 2012, p. 37)

What Happens When We Don’t Consider Context/Culture?

Evaluators have a “responsibilit[y] to:
- attend to cultures different from their own or different from the dominant cultures
- seek knowledge and understanding of different cultures
- involve stakeholders from participating cultures in the planning, conduct, interpretation, and reporting of the evaluation” (Fitzpatrick, 2012, p. 14)
One’s own “personal contexts and values influence how they see, or fail to see, other cultures” (Fitzpatrick, 2012, p. 14)
If we don’t consider context/culture, we risk:
- “identifying the wrong questions to frame the evaluation
- ignoring key stakeholders who are potentially strong users of evaluation
- misinterpreting stakeholder priorities or even program goals
- collecting data with the use of words or nonverbal cues that have different meanings than to the audience
- failing to describe the program accurately
- failing to understand [the program’s] outcomes because the evaluator is unable to notice nuances or subtleties of the culture
- reporting results in means only accessible by the dominant culture or those in positions of power” (Fitzpatrick, 2012, p. 14)

Ways to Attend to Context/Culture

“careful examination of one’s own values, assumptions, and cultural contexts” (Fitzpatrick, 2012, p. 14)
“inclusion of community members and program participants in evaluation planning and other phases” (Fitzpatrick, 2012, p. 14)
- engaging those who use the program provides “an unparalleled perspective” and can help “guide designs that are more feasible, measurement that is more focused, and interpretations that offer new insights” (Rog, 2012, p. 32)
- can also “foster transparency of methods, reveal flaws and suggest study qualifications, and in turn help to promote study findings as credible and having integrity” (Rog, 2012, p. 32)
“careful observation and respective interactions and reflection on what has been learned” (Fitzpatrick, 2012, p. 14)
“training the evaluation team to be culturally responsive” (Fitzpatrick, 2012, p. 14)
note that “cultural competence is needed in every evaluation, not just those where cultural norms “hit us in the face” as being different from our own. Every group of participants has, or develops, a culture and that culture influences the program and the evaluation” (Fitzpatrick, 2012, p. 16)
using a “range of methods to accommodate and incorporate context”
- use “strategies to rule in or rule out alternative explanations” (Rog, 2012, p. 33)
- can conduct “systematic plausibility analysis of threats to validity” – i.e. collecting data on the intervention’s theory of change, but also on the “plausibility of rival explanations” (Rog, 2012, p. 33)
conduct an evaluability assessment – in addition to helping to ensure a program is ready for evaluation/deciding what type of evaluation would be appropriate for the program/ensuring the program has an “internal logic that is implemented with integrity” (Rog, 2012, p. 34), an evaluability assessment can help to understand :
- “how a program fits within the broader environment
- [what] features of the environment may moderate the effects of the program
- how the evaluation may need to be structured to be maximally sensitive to this area of context” (Rog, 2012, p. 34)
often using multiple methods (qual & quant) can help us “go beyond determining whether a program works or not to explore why the outcomes occur or fail to occur. this can incude elaborating our theories to include potential mediators in a program that link activities with outcomes and testing for them in our analyses.” (Rog, 2012, p. 35)
“much like we have opened the black box of programs and provided useful data on the role that program mechanisms can have in triggering outcomes, I hope we can also navigate and explore the black hole of context and determine what aspects and areas of context have a role in determining the success or failures of interventions” (Rog, 2012, p. 35)
some analytic tools that can be useful:
- social network analysis
- systems thinking approach (e.g., instead of just a logic model of the program – include “other influences that are assumed to be operating” (Rog, 2012, p.36)
- looking at “distributions of outcomes and […] patterns of change”, rather than just “measures of central tendency […] that may not be sensitive to the differences that often result from complex, dynamic interventions” (Rog, 2012, p. 36)
  - e.g. identifying subgroups – as sometimes programs work better for some subgroups than others (e.g., if elements of context are influencing outcomes)
- multisite studies can help us to “measure the influences of the broader environment on programs” (because we get to look at the outcomes that result from the same program in different contexts or look at how programs are adapted to different contexts and this can help us understand why they work or don’t work)
- Conduct a Context Assessment (see next section)
- Note: need to balance context, stakeholder needs, and rigour when designing and executing an evalatuion.

balance context-rigour-stakeholder needs Context Assessment: Areas of Context That Affect Evaluation Practice

Rog (2012) proposes a framework that includes 5 areas where context can affect evaluation practice, each of which have 7 possible dimensions ¹There are also subdimensions that may be applicable, including “demographic issues of gender, race, and language, as well as issues of power differences, class, other denominators of equity, and sociopolitical status” (Conner, 212, p. 90)..

My diagram to illustrate Rog’s (2012) framework. Note that the lines coming from each circle are meant to represent the 7 dimensions that are listed on the right.

Context of the problem/Phenomenon Being Studied
- what’s already known about the problem that the program addresses?
- what kinds of studies have already been done?
- what tools are available?
- how it affects evaluation: if not a lot is already known about the issue, it’s less likely that you can “have a controlled understanding of the intervention and its effects” (Rog, 2012, p. 29), may need to be more descriptive in your study to understand the problem better
Context of the Intervention
- structure, complexity, dynamics of the intervention
- e.g., stage in the project lifecycle has implications for how you evaluate (e.g., wouldn’t do an outcome evaluation on a program that is currently being developed – you might plan for one long-term, but wouldn’t be expecting to see outcomes immediately)
- “how dynamic and evolving a program is, how complex with respect to its theory of change, and the extent to which it blurs with the setting itself” (Rog, 2012, p. 29), has implications for how you choose to design the evaluation
- interventions that blur with their broader environment = makes it “difficult to make attributions of change to the intervention because of the number of confounding externalities” (Rog, 2012, p. 29), “hard to trace the exact causal mechanism(s)” (Rog, 2012, p. 29)
- how it affects evaluation: for highly dynamic/complex interventions, may “need to have multiple indicators, multiple methods and […] need to examine multiple pathways to see if and how change occurs” (Rog, 2012, p. 29); when it is expected that it will take a long time to see outcomes, may “require interim measures sensitive to showing that the intervention is making changes in the short-run that indicate it is on the right track”(Rog, 2012, p. 29)
Broader Environment/Setting of the Intervention
- this is what evaluators tend to most commonly think of when they think about “context”
- often multilayered (e.g., a school setting, which is in a school district, a broader community, a state)
- programs often blur with their contexts (e.g., a community change initiative aims to change the community in which it sits)
- how it affects evaluation:
  - if an intervention is being rolled out in different communities, can look at how it is adapted in those communities (e.g., is the original theory of change intact? what factors in the context influenced implementation and outcomes?)
  - need to “understand the ways in which the broader environment affects the ability of an intervention to achieve outcomes” (Rog, 2012, p. 30) – this is “critical to understanding the generalizability of the evaluation findings to other contexts or situations” (Rog, 2012, p. 30)
Parameters of the Evaluation
- the method(s) you choose are influenced by the available “budget, time, and data” (Rog, 2012, p. 30)
- how it affects evaluation: evaluations often come with constrained budgets, timelines, and available data, so this will constrain you in your choice of methods (e.g., if there’s no baseline data available, you can’t do a regular pre-post test, so might need to creative in coming up with data you can use to see if things have improved; with limited resources, you have to decide which of many possible things you could measure will actually get measured; if the timeline for the evaluation is shorter than when you can reasonably expect outcomes to occur, you may need to design an evaluation that looks at if you are on track to achieve those outcomes down the road).
Broader Decision-making Context
- need to “understand who the decision makers are, the types of decisions they need to make, the standards of rigor they expect, and the level of confidence that is needed to make the decisions, as well as other structural and cultural factors that influence their behavior” (Rog, 2012, p. 32)
- how it affects evaluation: understanding the decision making context allows the evaluator to design an evaluation that is more likely to get used.

Conner et al (2012) took Rog’s framework and created a process they called “Context Assessment“, which they advocate be used to “plac[e] context among the primary considerations” (p. 89) in an evaluation.

Context Assessment (CA):

“prompt[s] evaluators to consider context more explicitly and carefully” (Conner, 2012, p. 93)
“prompt[s] evaluators to consider which elements of context might be most important for the evaluator to consider at different stages of a particular evaluation” (Conner, 2012, p. 93)
doesn’t require you to “catalogue all elements of context, but instead focus on those [you] identify as most relevant” (Conner, 2012, p. 93)
helps to “shape the focus of their evaluation, their means of data collection, analysis, and interpretations, and their methods of dissemination based on their understanding of the critical elements of the context.” (Conner, 2012, p. 94)
instead of being a “confounding condition”, a contextual issue “becomes a useful key to understanding what makes a program work” (Conner, 2012, p. 103)
since context is changing, CA involves:
- an initial, intensive assessment
- briefer check-ins during the evaluation process
3 steps, based on the “three main evaluation steps: planning, implementation, and use/decision making” (Conner, 2012, p. 93)

Evaluation Planning
- CA conducted during this phase to understand relevant aspects of context to inform the evaluation plan
- template for conducting CA for evaluation planning (adapted from Table 6.1 in Conner et al, 2012):

Area	Guiding Questions	Answers to Guiding Questions	Implications for Evaluation
General phenomenon/problem	What is the problem the program is addressing?
	How did it emerge? How long has it existed?
	What groups prompted concern about it?
	What is already known about it?
	What are the dominant methods used for understanding the phenomenon/problem?
	What tools exist for measuring change?
Intervention	Where is the program in its life cycle?
	How is the program structured?
	What are the different components and how do they fit in the broader environment?
	Who does the program serve?
	What are their characteristics, beliefs, culture, needs, and desired outcomes?
Broader environment around the intervention	What are the different layers of environment the intervention that affect and can be affected by the intervention?
	What aspects of these different climates are affecting the design and operation of the program?
	What are important historical, social, and cultural elements of the community in which the program is conducted?
	Are there political or social views that affect perspectives on the program, its clients, or decision makers?
Parameters of the evaluation	What are the primary and secondary evaluation questions and their implications for possible methodology and design choices?
Parameters of the evaluation	What resources are available to support the evaluation (e.g., budget, time frame, local evaluation capacity, evaluation ethos)?
Decision-making arena	Who are the main decision makers/users of the evaluation information?
	What are their views, values, and history about the program, and about evaluation?
	What is the larger political culture in which they work?
	What are the expectations of their organization?
	What are the expectations of citizens they serve regarding government programs, and about evaluation?
	What are the political expectations for evaluation?

Evaluation Implementation
- build in some periodic re-assessments of the context during the implementation of the evaluation to check if anything has changed that could affect the evaluation
- can be quick, but should be done explicitly
- puts the evaluator in ” abetter position to adjust measurements to pick up changes and to capitalized on new design opportunities to detect program-related changes better” (Conner et al, 2012)
- template for conducting CA for evaluation planning (adapted from Table 6.2 in Conner et al, 2012):

Area	Guiding Questions	Answers to Guiding Questions	Implications for Evaluation
General phenomenon/problem	Have new aspects related to the phenomenon/ problem been identified or arisen?
	Have we learned more about the phenomenon or the problem that may influence our approach?
	Has new knowledge been gathered through other research and evaluation that may have a bearing on this evaluation or on stakeholders’ receptivity to findings?
Intervention	Have new intervention components been added/modified/eliminated that affect the intervention?
Intervention	Has the level of intensity of the intervention changed because of funding increases or decreases?
Broader environment around the intervention	Have new relevant events, people, or issues intervention arisen in the general environment in which the intervention is anchored?
Broader environment around the intervention	Do these new factors have implications for the intervention and/or its evaluation?
Parameters of the evaluation	Do the main evaluation components continue to be responsive to the relevant contextual factors?
Parameters of the evaluation	Have the budget, time, and so on changed in any way?
Decision-making arena	Have new organizations or individuals, with different perspectives, entered the decision-making arena, and do these new factors need to be addressed?
Decision-making arena	Have the needs of decision makers changed in any way that might impact the evaluation or receptivity to the findings?

Decision Making
- CA now limited to 2 of the 5 areas (broad environment and decision-making arena; though may also consider phenomenon/problem re: making recommendations)

Area	Guiding Questions	Answers to Guiding Questions	Implications for Evaluation
Broader environment around the intervention	Are the original stakeholders still relevant?
	What new stakeholders need to be added?
	Related to the content of recommendations that might be made, is the infrastructure in place and are resources (staff, materials, support) available to provide the actions and services that will be recommended?
	Might these resources be drawn away from other, unrelated programs, possibly jeopardizing them?
Decision-making arena	How has the arena changed since the outset of evaluation planning?
	Should other important stakeholders be included?
	How are decision makers responding to the evaluation?
	How are they using it?
	What elements receive the most attention from various stakeholders and decision makers?
	How do their values, position, or history affect their use of the information?
	Are there other dissemination or communication strategies that might increase their use?
General phenomenon/problem	Have we learned more about the phenomenon or the problem that may influence our recommendations?

some limitations and challenges to CA:
- CA “cannot be rigidly defined and requires subjective judgements” (but “sharing the results of a context assessment with the primary stakeholders for an intervention can help check out counter inherent evaluator biases” (Conner, 2012, p. 104)
- “does not guarantee that all relevant factors will be identified” (but certainly more will be identified that if you do not undertake an explicit context assessment) (Conner, 2012, p. 104)
- requires extra time/energy (but the benefits of CA should make it worth that time/energy) (Conner, 2012, p. 104)

Final Thoughts:

“Context will influence the evaluators’ views and the evaluation will influence the context” (Conner, 2012, p. 93)
“Most evaluators have moved from early experimental, “hands off” tradition where they were concerned that their involvement might change the program or threaten the perceived neutrality of the evaluation to one in which evaluators are immersed in the program, [giving] evaluators the potential to consider and learn about context” (Fitzpatrick 2012, p. 13)

References:

Conner, R.F., Fitzpatrick, J.L., & Rog, D.J. (2012). A first step forward: Context assessment. New Directions for Evaluation. 135: 89-105.

Fitzpatrick, J.L. (2012). An introduction to context and its role in evaluation practice. New Directions for Evaluation. 135: 7–24.

Rog, D.J. (2012). When background becomes foreground: Toward context-sensitive evaluation practice. New Directions for Evaluation. 135: 25-40.

Footnotes[+]

Footnotes
↑1	There are also subdimensions that may be applicable, including “demographic issues of gender, race, and language, as well as issues of power differences, class, other denominators of equity, and sociopolitical status” (Conner, 212, p. 90).

Posted in evaluation, evaluation tools, notes | Tagged context, context assessment, evaluation, evaluation tools, notes | Leave a comment

Some evaluation reading

Posted on January 28, 2016 by Beth

A Practical Guide for Engaging Stakeholders in Developing Evaluation Questions

“evaluation is all about asking and answering questions that matter” (p. 3)
“evaluation can be an important strategic tool for measuring:
- the extent to which a program or initiative’s goals are being met
- the ways in which a program or initiative’s goals are being met
- how the program or initiative might be contributing to the organization’s mission” ¹slightly paraphrased to make it a bulleted list (p. 3)
“develop a set of evaluation questions that reflect the perspectives, experiences, and insights of as many relevant individuals, groups, organizations, and communities as possible” –> relevant and useful evaluation (p. 3)
“evaluations should be conducted in ways that increase the likelihood that the findings will be used for learning, decision-making, and taking action” (p. 6)
“Good evaluation questions:
- establish boundaries and scope of an evaluation […]
- are broad, overarching questions that the evaluation will seek to answer […]
- reflect diverse perspectives and experiences
- are aligned with clearly articulated goals and objectives
- can be answered through data collection and analysis” (p. 8)
benefits of engaging stakeholders in designing evaluation questions:
- “increases quality, scope, and depth of questions” (p. 10)
- “ensures transparency” (p. 10)
- “ensures that the evaluation questions have been thorough vetted and thoughtfully crafted and that they are the right question to be asking” (p. 10)
- “raises awareness of the evaluation itself and may contribution to building an audience for the eventual findings” (p. 10)
- “communicates a commitment to being inclusive (vs. exclusive), outward looking (vs. inward looking), and expansive (vs. insular). Stakeholders not only help navigate the political waters more effectively, but also serve to positions the evaluation so that findings are perceived to be useful, relevant, and credible and are morel likely to be used” (p. 10)
- “builds evaluation capacity” (p. 11)
- “fostering relationships and collaboration” among the stakeholders (p. 11)
5 step process:
1. prepare (understand the program being evaluated)
2. identify potential stakeholders
  - you need people with: expertise, different perspectives/experiences, responsibility for the program, position of influence, interest in the issues, proponents of evaluation
  - people can fulfill more than one of the above roles
  - internal and external
3. prioritize the list of stakeholders (vital/important/nice to have; may help you see if you missed anyone)
4. consider potential stakeholder’s motivation to participate (commitment to goals of program, personal stake, professional development, compensation)
5. select an engagement strategy
  - criteria for selecting: time, budget, geographic location of stakeholders, range of perspectives, extent of existing relationships in the group, availability of stakeholders, number of stakeholders, their familiarity with evaluation, degree of complexity of the evaluand
  - can use more than one strategy (e.g., one strategy with some stakeholders and different strategy with others; two-part process (one strategy, followed by another)
  - page 23 of the book shows a table that helps match your answers the above criteria with best strategies
  - strategies include: one-on-one meetings, group meetings, logic modelling, mind mapping, Appreciative Inquiry, role playing, brainstorming, nominal group technique, discussion of article/presentation, moderated discussions, surveys, Delphi technique
things to consider when developing evaluation questions:
- “what does success of your program look like?”
- “what would we need to know to explore the extent to which the program is effective or successful?”
- “what questions seem to come up repeatedly, in conversations with others, or in your own work, concerning the effectiveness, impact, and/or success of this program or initiative?” (p. 29)

Source: Preskill, H., & Jones, N. (2009). A Practical Guide for Engaging Stakeholders in Developing Evaluation Questions. Princeton, NJ: Robert Wood Johnson Foundation Evaluation Series. “

Evaluation Models & Evaluation Use

conducted a systematic review on “collective-level knowledge exchange”, where “collective-level” refers to “interventions occurring at the organizational level or in policy-making arenas, as distinct from interventions targeting modification of individual behaviours.” (p. 62) – this made me think of how we defined “patient engagement” in my work on the AWESOME model (i.e., engaging patients on health services and systems level planning, as opposed to engaging patients in decision making about their own care).
collective-level systems are “characterized by high levels of interdependency and interconnectedness among participants” and “all participants receive information from various sources, make sense of it, modify it and produce new information aimed at others” (p. 62) – that is, people don’t just make decisions based on the scientific evidence alone
the review “found no credible, empirical data showing any positive link between level of use and information’s interval validity or the conformity of its production process with scientific procedures” (p. 65)
knowledge use depends on:
- sense-making
- coalition building
- persuasion
- rhetoric
“action proposals” are “assertions that employ rhetoric to embed information into arguments to support a causal link between a given course of action and anticipated consequences” (p. 63) – e.g., we should do X, which the evidence suggests will lead to Y, generally isn’t sufficient to convince people
“collective-level knowledge use” = “the process by which users incorporate specific information into action proposals to influence others’ thought, practice, and collective action rules” (p. 63) – this definition “dissociates knowledge use from actual practices or outcomes” (p. 63) – this is, this definition refers to using knowledge to form recommendations (or guidelines, etc.), but does not include the next step of those recommendations/guidelines/etc. actually being put into practice (as a person has less influence over the latter (as other influences also come into play)
- information –> recommendations based on that info –> recommendations put into practice
two core dimensions of the context of use:
1. issue polarization & ideology
  - when information is contrary to what someone already believes, they tend to ignore it or at least subject it to stronger skepticism (than if it fits within their current beliefs)
  - in a given situation, there will be multiple people who may have different beliefs/perceptions about a given piece of information
  - low issue polarization = when potential knowledge users agree
    - that there is a problem
    - the problem is important (relative to other potential issues)
    - on criteria that should be used to judge potential solutions
2. cost-sharing equilibrium in knowledge exchange systems
  - “knowledge has both a cost an a value” (p. 64)
  - someone has to pay the cost of the knowledge exchange
  - people will pay the cost of knowledge exchange to the extent they see it as providing value
  - in a knowledge exchange there are:
    - users: those “who hold institutionally sanctioned positions that allow them to intervene in the practices, rules and functioning of organizational, political or social systems” (p. 64)
    - producers: those “who contribute to legitimate knowledge production institutes without having capacity to put the knowledge developed to use.” (p. 64) – e.g., academic researchers or evaluators who generate new knowledge that could be useful to inform health services, but don’t have a role in a health services organization
    - intermediaries: other stakeholders/lobbies who “will want to have their say and will contribute to the information flow” (p. 64)
  - there is a cost-sharing equilibrium between users and producers/intermediaries
  - users:
    - have a finite amount of attention
    - have to balance the different pieces of information they receive
  - “use of knowledge is influenced by its:
    1. relevance (timeliness, salience and actionability)
    2. credibility
    3. accessibility” (p. 65)
  - pre-existing opinions influence the perception of both (A) relevance and (B) credibility

the authors created a framework for classifying different approaches to evaluation based on the two dimensions of level of issue polarization and cost-sharing equilibrium

cost-sharing equilibrium rests mostly on…	users
cost-sharing equilibrium rests mostly on…	producers
		low	high
		level of issue polarization

[note: I added in the colours because my blog isn’t allowing me to show horizontal gridlines in a table.]
the idea of the framework is not to definitively place evaluation models/approaches on the grid but “to offer some insights into the relationship between use, models, and contexts” (p. 66)
the showed some examples of where they would place different models on the grid (some things crossed over to multiple quadrants):

cost-sharing equilibrium rests mostly on…	users	UFE EE	UFE
cost-sharing equilibrium rests mostly on…	producers	RE	RE Democratic E
		low	high
		level of issue polarization

UFE = utilization-focused evaluation, RE = realistic evaluation, EE = empowerment evaluation, Democratic E = democratic evaluation
and then synthesized to this:

cost-sharing equilibrium rests mostly on…	users	The utilization paradise	..UP LZ
cost-sharing equilibrium rests mostly on…	producers	The knowledge-driven swamp	The lobbying zone
		low	high
		level of issue polarization

where the utilization paradise (UP) and lobbying zone (LZ) overlap into the upper right quadrant

Source: Contandriopoulos, D., & Brouselle, A. (2012). Evaluation models and evaluation use. Evaluation. 18(1): 61-77. (Abstract)

Footnotes[+]

Footnotes
↑1	slightly paraphrased to make it a bulleted list

Posted in evaluation, notes | Tagged evaluation, evaluation approaches, evaluation models, evaluation use, participatory method, stakeholders | Leave a comment

More on evaluating healthcare IT

Posted on November 26, 2015 by Beth

Evaluating informatics applications – some alterative approaches: theory, social interactionism, and call for methodological pluralism

evaluating informatics tools, such as clinical decision support tools, under controlled conditions doesn’t provide information about how context (including human and cultural factors) affect whether those tools will actually be adopted in the real world
as well, focusing exclusively on pre-determined outcome measures means you will miss out on learning about the processes by which the system is actually used, as well as unanticipated and emergent effects
evaluations need to take into account “social, organizational, professional, and other considerations” (p. 40)… such as “power, culture, group relationships, work routines, stakeholders, professional values, social networks, institutional organizations, and judgement” (p. 40), but note that these things “elude quantitative and RCT-type evaluation approaches” (p. 40)
reports of evaluations conducted in a real-world setting often fail to include information about the settings in which they were conducted, making it difficult to interpret the findings.
when an RCT/experiment is conducted and finds a clinical decision support system to be ineffective, it does not generally provide answers to the question “why not?” so it does not “help in building better systems or in preventing decisions that may result in abandoning technologies that could potentially be useful” (p. 43)
cites Davis & Taylor-Vaisey: “The adoption of any innovation or the dissemination of new medical knowledge should be considered in a holistic, contextual manner” (p. 43)
“information technologies are embedded within a complex social and organisational context” (Heathfield & Buchan, 1996 cited on p. 46)
“artifacts cannot be understood independently from how they are use in actual practice” (p. 46)
“in sociotechnical theory, a change in technology, people, task, or structure is seen to result in adjustments by the other three components in order to maintain organizational stability. Berg focuses on work practices and how individuals, tools, documents, and machine are cooperative elements in emergent networks that make work practices function smoothly” (p. 47)
“neither the environment nor the system itself is stable […] there is something of a co-evolution of the environment and the system” (p. 47)
“a social influence or social interactionist approach […] takes account of the kinds of social, political, cultural, historical, institutional, cognitive, and other contextual constituents of the change process” (p. 47)
- with medical information applications: “how the technology is used and what changes occur are thought to result from coplex social interactions. Because users may modify information systems during design, implementation, and sue, they are views as active participants in what occurs” (p. 47)
- “characteristics of the technology, of the developers and potential users, and of the organizations into which they are introduced, are seen to interact with each other and may themselves chance through these interactions. The participants, the setting, and the technology are treated as dynamic emergency processes rather than as variables that can be held constant, and causality is seen to be multi-directional rather than uni-directional. Social interactionist evaluation involves studying social, political, organizational and related processes as they unfold over time.” (p. 47)
researcher draw on:
- theories of change
- social science theories
- may use “an interpretivist approach and study what meanings individuals ascribe to the technologies and processes under study” (p. 48)
Kaplan suggested guidelines for such studies:
- “focus on a variety of concerns
- use multiple methods
- be modifiable in study design
- employ longitudinal designs
- conduct formative as well as summative evaluations” (p. 48-49)

Evaluating information technology in health care: barriers and challenges

“evaluation is not just for accountability: but for development and knowledge building in order to improve our understanding of the role of information technology in health care and our ability to deliver high quality systems that offer a wide range of clinical and economic benefits.”
basing decisions on healthcare IT on failure of RCT’s to show improved outcomes–> may cause prosing technology to be abandoned prematurely
basing decisions on healthcare IT on unsubstantiated reports (e.g., no actual evaluation) –> may cause resources to be wasted on ineffective IT and/or inappropriate application of IT
“many evaluation studies ask inappropriate questions, apply unsuitable methods, and incorrectly interpret results. The evaluation questions most often asked include those concerning economic benefits and clinical outcomes, despite lack of strong evidence of such and the recognition of the difficulty of applying results in other context”
RCTs
- are “vulnerable with respect to external validity: trial results may not be relevant to the full range of subjects (that is, specific implementations of a healthcare application) or typical uses of a system in day to day practice”
- “negative results from[RCTs] cannot help us understand the effects of clinical systems or build better ones in the future.”
- even if an RCT demonstrated benefits, it “does not necessary mean that end users will accept a system into their working practices”

“As pointed out by McManus, “Can we imagine how randomised controlled trials would ensure the quality and safety of modern air travel …? Whenever aeroplane manufacturers wanted to change a design feature … they would make a new batch of planes, half with the feature and half without, taking care not to let the pilot know which features were present.”

Evaluation and Implementation: A Call for Action

evaluation of information systems is moving towards “more holistic view of information systems and their evaluation” (p. 12)
can apply “the socio-technical approach towards evaluation: try to understand why information systems are a success or a failure, taking into acount the social context in which the systems are used” (p. 12)
the “sociotechnical approach should not only try to determine whether an implementation was successful or not. It should also contribute to developing theory and good practice for successful implementations” (p. 14) – requires “close observation and evalaution of implementation processes” (p. 14)
in the “declearation of Innsbruck, “an information system is defined as the techincal artifact and the environment (social, organizational) in which it is used” (p. 12)
existing studies trying to evaluate whether IT systems provide some sort of “benefit” or “value” tend to be inadequate because:
- too little time allocated for evlaution
- too few resources allocated for evaluation
- unclear objectives of the evaluation
- weak evaluation methods (including starting too late to get baseline data)
- poor reporting (e.g., drawing conclusions not based on evaluation data (but on personal opinion instead); not clearly distinguishing between expected benefits and actual benefits (i.e., not measuring whether benefits occurred, but just assuming they did)
there is a need for:
- “an evidence base of good evaluation practice”
- “improvement in the reporting of evaluation studies” (p. 14)

References:

Healthfield, H., Pitty, D., & Hanka, R. (1998). Evaluating information technology in health care: barriers and challenges. British Medical Journal. 316(7149):1959-61 (full text)

Kaplan, B. (2001). Evaluating informatics applications–some alternative approaches: theory, social interactionism, and call for methodological pluralism. Int J Med Inform 64(1):15-37 (abstract)

Talmon, J.L. (2006). Evaluation and implementation: A call for action. IMIA Yearbook of Medical Informatics. 11-15.

Posted in evaluation, healthcare, information technology, notes | Tagged evaluation, health information technology, informatics, information technology, IT, sociotechnical theory | Leave a comment

CESBCY 2015 Conference – Collaboration, Contribution, and Collective Impact

Posted on November 20, 2015 by Beth

Today was the Canadian Evaluation Society BC and Yukon (CESBCY) chapter’s conference. Now, I may be biased given that I was the conference Program Chair, but I think we had an outstanding program of presentations this year! But before you think I’m being too arrogant, I will state for the record that the outstanding program was 100% due to the fantastic presenters – my job as program chair was easy given that incredible proposal we received from evaluators and non-profit organizations from around the region.

This year’s conference theme was the non-profit sector; “Collaboration, Contribution, and Collective Impact” and the only complaint I had about the conference was that there were so many good sessions that I couldn’t go to all the ones I wanted to see!

Here are my notes from the sessions that I did attend:

Social Return on Investment (SROI) for Aunt Leah’s place

-worked with Sametrica – company from Ontario; proprietary SROI framework
-students from UBC economics
–Aunt Leah’s Place works with youth in foster care – those transitioning out of foster care and work with low income mothers who are struggling to keep their children
-40% of homeless youth have been in government care (foster care is a “pipeline to homelessness”)
-people are in school longer (post-secondary education) and salaries are relatively flat, but housing prices have skyrocketed over a generation
-70% of parents with 19-28 years olds at home provide free rent and groceries (essentially a subsidy for the youth)
-this is something that youth in foster care don’t get – they “age out” of foster care at 19 years
-Aunt Leah supports these youth after 19 years of age (trying to provide what many parents provide for their children in this age range), but does not provide housing
-700 youth “age out” of foster care per year in BC
-Aunt Leah – 10% receiving support were homelessness vs. 32% of control group (not getting Aunt Leah support)
-SROI – estimated a $7 return for every $1 invested (a small study; didn’t include some of their newer programs) – they wanted to do a more robust study and look at their different programs
-they logic modelled programs and created indicators, which they came up with a $ amount for
-ROI – typically just about financial stuff
-SROI – attempts to be more holistic (including social, environmental, and more holistic economic perspective)
-financial proxies – they looked at similar activities and what people were willing to pay for them (e.g., what’s the “value” of building social connections through an activity Aunt Leah’s offers? found out how much people pay to join similar social activities for this age group?)

Promoting social innovation in vulnerable populations – a Developmental Evaluation

-shared challenge that funders (Community Action Initiative, City of Vancouver, Vancouver Foundation) were facing: low quality of innovation proposals coming forward
-thought maybe they needed to do something differently to develop an environment that encourages experimenting and testing to support innovation
-traditionally, funders don’t really engage with projects until their proposal is funded
-but innovations don’t lend themselves to fully formed proposals
-so thought about reaching out to applicants with good project ideas, but weak or non-existent innovation plans, to help them develop the innovation side of things
-project went a bit sideways as many funders wanted in on the project across the province
-decided they needed to evaluate if this would really lead to more socially innovative proposals and if this could be done well in a partnership model (multiple funders)
-engaged an external evaluator, decided on a Developmental Evaluation approach
-logic modelled to show how resources –> planned outcomes (increased knowledge of social innovation, application of this know to create more innovative project proposals, spread of knowledge throughout organizations, change in relationship between funders and participants (and among the funders))
-open to adapting LM (because it was developmental evaluation approach), but didn’t need to in the end
-mixed methods for data collection
-interviews with funders (how did you come to this project (at start)
-surveys with participants – they got survey fatigue, so changed to in depth interviews with project teams at the end
-found that the process –> more socially innovative proposals (sounds like based on the opinion of the funders?), partnerships -> able to leverage each organization’s resources
-allowed organizations to take on more risk than they normally would (permission to be innovative, which comes with risk of project maybe not coming to fruition)
-found increased understanding and application of social innovation; learnings spread throughout the funding organizations; increased understanding of organizational readiness for system change; increased acceptance of uncertainty in innovation process)

ECLIPS: An Innovative and Cost-Effective Evaluation Capacity Building Initiative for Not-for-Profits

-evaluation capacity building model used at UBC, but can also be used at other organizations
-they are an internal evaluation unit at UBC Faculty of Med, but can also be used by external evaluators
-CLIPs – Communities of Learning, Inquiry and Practice – developed by B. Parsons in a higher education context (but it applies to a wide range of settings)
-allows self-selected teams to build evaluation capacity by conducting a project they are interested in
-3-6 people in the team, they own the project and conduct and report out on it
-small amount of funding (e.g., they did $1000) – e.g., transcription, honararia, etc.
-resources provided to guide participants to do their work
-evaluator to provide assistance/guidance to the team
-UBC wanted to build a strong culture of evaluation in their Med School (called ECLIPs because there is another medical project called CLIPS)
-very few teams used any of the $1000 they were eligible to get
-20 applications, 18 approved, 11 fully completed (in time for the evaluation of ECLIPS + one completed later)
-86% of projects were facutly-led
-hired Kylie Hutichson to evaluated ECLIPS
-methods: interviews with 10 team leaders + doc review + 1 focus group with Evaluation Specialists (tried online survey with team members, but didn’t get good response)
-findings:
-evaluation capacity building occurred at the level of Faculty of Med, Evaluation Studies Unit, programs, and participant level
-participant level: increased appreciation of, knowledge & skills in evaluation, and increased evaluation activity
-program-level: increased use of evidence for program decision making
-ESU got understanding understanding of selves and others developed a better understanding of ESU
-FoM level – increased number of evaluation champions
-Evaluation Specialists seen as a major help to the teams
-some teams didn’t use the resources at all (overwhelming, too idealistic and not practical for their limited time)
-5 projects didn’t get off the ground due to lack of time; some teams felt 1 year not long enough to do a project (but some really liked the 1 year deadline)
-making some changes to ECLIPs based on findings:
-more flexible time lines and intake during the year
-consultation with Evaluation Specialists before they submit their proposals
-more training for Evaluation Specialists on how to be a good coach
-revising resources
-some of the key ingredients:
-program teams chose evaluation projects that are meaningful to them
-tailored coaching for the team
-www.insites.org/clip – some free resources
-upcoming article in New Directions journal

Mobile Learning in Evaluation for Health Leaders: Evaluation of an Innovative Capacity Building Tool

-a course to help senior health leaders to become informed users of evaluation
-can also be used with other sectors, including non-profits
-public health tends to educate people about evaluation, but it teaches them how to do it, whereas this course teaches people on how to *use* evaluation findings in their work
-mobile learning because its anywhere/anytime; engaging; personalized; interactive; highly focused; informal
-you can also collect mobile analytics (to see how people are using it)
-gamification – more engaging for learners
-evaluation to inform improvements in course design and content and to look at if this could be an effective way to provide this kind of learning
-beta testers – 15 health leaders, 1 physician, 9 internal evaluators, 2 mobile tech experts
-data collection – end of unit and end of course surveys, phone interviews with health leaders; phone focus group with evaluators, unstructured interviews with tech experts (observed them going through the course to get their perspective)
-findings:
-users like short and succinct units
-some difficulties navigating through the course (some inconsistencies in the buttons)
-expected more interactivity (people like what was there and wanted more)
-expected more personalization
-many people disliked stock photos
-mixed opinions on use of audio (some liked, others felt it did not add value)
-units 1-3: too simplistic for senior health leaders
-units 4-7: engaging, relevant and practical (topics were: systems thinking, enhancing use, managing evaluation, supporting evaluation)
-valuable way to increase knowledge and interest in evaluation
-majority would recommend course to the colleagues (if improvements were made in content & design (as per the above findings))
-recommendations:
-revise target to mid-level mangers and directors (not just senior leaders)
-revise units 1-3 to be more engaging and relevant (so you don’t lose them before they get to the more engaging, relevant stuff in the later modules)
-increase interactivity, more gamification
-improve course navigation
–evaluationforleaders.org – revised version will be available spring 2016, for free!

The Doctor is In: Evaluation Therapy for All (Forum Theatre)

This session was absolutely brilliant! It involved the presenters running through a skit with some common evaluation errors/issues and then running through the same skit again, but this time allowing anyone in the audience to yell “Stop” when they saw an issue that they felt could be done better. And then the audience members joins the scene and “corrects” the problem. It’s hard to capture it in writing, but it was absolutely hysterical!

Move Over Accountability, We’re Putting Learning in the Driver’s Seat
–Fostering Change, Vancouver Foundation
-pilot project – transition worker to give support to those transitioning out of foster care (support provided to youth up to age 20)
-just started in May 2015
-referrals come in from youth, probation officers, social workers, etc.
-have a Youth Advisory Circle – young people age 17-24, engaging the youth in “adult” conversations (where they haven’t traditionally been included)
-principle-based (rather than expecting people to implement certain models)
-value the lived experience – they have expertise, knowledge, and wisdom
-shared learning agenda – iteratively created with grantees
-have a shared learning and evaluation working group – frontline staff and managers from grantee agencies
-learning/knowledge exchange days
-grantees have a strong relationship with the funder and with other grantee agencies
-grantee was worried that the expectation that they contribute to working group might be really time consuming, but it’s actually been very beneficial to them
-rather than relying on “misery porn” (e.g., images of sad looking youth to try to get funding), focus on images of empowering youth – how things can be positive, how issues are due to the system, rather than blaming youth for the circumstances they are in

10 Plus Ways to Stretch Your Evaluation Budget
-“I want to do myself out of a job” – I want to give evaluation away!

1. Lower your rate
-when you are starting out, to build a portfolio
-working with a repeat client (or getting them to become a repeat customer)
-you want to try something new (you get learning out of it)
-pro bono work (not to do off the side of your desk or poor quality, but because you want support the work – it’s a volunteer contribution
-can can a chartitable recipet (for your personal taxes) if working with a nonprofit
-include full cost and show the discoutn (so they see the true value of the evaluation)
-if they have a funder that requires, eg. 10% of budget must be evalution, you can ccharge your full rate

2. Act as an evalution coach
-coach program staff to go through the evalution process
-use a reallly good evalution “how to” guide

3. Leverage evaluation course
-program staff goes through an evaluation course (e.g., CES essnetial skills series, and you act as theri oach

4. Become a case or student project
-SFU, UBC, UVic all have evaluation courses that use real programs as case studies for the students to develop an evaluation plan
-CES Case Competition
-Timing needs to be right

5. Use existing evlaution frameworks or systems
-RE-AIM
-Vancity Demonstrating Value
-IHI Triple Aim
-etc
-helps you be more efficient
-has to work for your program, of course
-your value-add service as an evaluator is focused on what’s not includied in the existing framework

6. Choose data collection tools from online tool repositories
-engage staff in selecting more
-e.g.:

7. Support program staff or participants to collect data
-provide training
-do routine checkins
-especially helpful in multi-lingual environments and staff speak language of participants (and you don’t have budget for interpreters)

8. Use analysis packages from online survey packages
-e.g., Fluid Survey
-qualitative data: invest in qualitative data analysis program (really speeds up the process)
-http://cognitive-edge.com/sensemaker/ – engage program participants in doing the analysis

9. Hold data interpretation sessions
-present analyzed data and have stakeholders do the interpretation and generate recommendations
-can use a simple framework like: What? So What? Now What?
-saves you a lot of time (helps you develop the story of what the findings told you – and helps you write the report)

10. Simply the report
-talk to client in planning stages about what kind of report is wanted/needed
-sometimes they only want a PowerPoint (the title of the slide is the conclusion of what the graph shows)
-even when narrative report is wanted: follow the 1-3-25 page rule from CFHI
-can put all the gory details in appendices (that are typically only ever read by other evaluators!)
-www.piktochart.com
-www.canva.com (not free) – slide docs

Tips from the audience (this was the “Plus” in the title!)

-clarify language – make sure that you are using words the same way
-listen to how the clients talks and use the language they use (e.g., you can say “what changes do you want to see? vs. what are your intended outcomes. you can say “how will you know if you are seeing those changes? vs. indicators)
-make sure you are a good fit for the project
-frequent check ins (rather than revealing stuff only at the end)
-Vantage Point – a nonprofit that has a subsidiary (Go Volunteer) that you can offer up discounted services
-start with a small projects to give them a taste for what evaluation can offer, but they still have unanswered questions that you can do the next evaluation on

Funder Panel

-Bryn (Vancity Community Foundation), Trilby (Vancouver Foundation), Cathy (Telus)
-standardized metrics are a challenge to develop
-as soon as you start tracking indicators, people focus their attention on that (and you don’t want to unintentionally drive them to do stuff that will cause a negative effect)
-process measures vs. outcome measures – e.g., do you care that you reached 1000 students this year or that you prevented 3 suicides?
-big corporations (like Telus, IBM, Accenture) often have employees who want to give back to community, but want to use their skills to do that, so why not match them up with non-profits to provide pro bono services? (knowledge philanthropists)
-you have to recognize the realities of the non-profits you are working with (e.g., VF offered to send staff from their grantee agencies to a 5 day conference and thought it would be an amazing opportunity, but the non-profits said “We can’t send our staff to a conference for 5 days! Who do you think will run the program??”
-people want to get to outcomes and impact, but often we only get to activities and outputs (i.e., running the program)
-balance between the value the data will bring to you and the cost of getting the data
-there’s been a shift towards funding projects instead of operations, so people are having to show that their program is “new” and “innovative” (and may just position something they are already doing as “innovative”)
-funders get way more requests for funding than they could ever fund (and it’s sad to have to say “no” to projects that could be really great)

Posted in evaluation, event notes, notes | Tagged Canadian Evaluation Society, CES, CESBCY, CESBCY15, CESBCY2015, conference, conference notes, evaluation | 1 Comment

More About Measuring Errors and Adverse Events

Posted on November 16, 2015 by Beth

The Measurement of Active Errors

in quality improvement, we often need to measure things for comparative purposes:
- to compare organizations/clinicians with each other
- to draw cause and effect conclusions about how something (e.g., a policy, a process) affected safety/quality
to be able to do this, we need data that is measured accurately and precisely

Measurement of Outcomes vs. Processes

outcomes include:
- mortality
- physical morbidity
- psychological well-being
- satisfaction with services
the latter two can be sensitive to quality of care, but the former two are often not due to poor quality care (healthcare services are often provided to those who are unwell and/or those at risk of something bad happening to them and so morbidity and mortality can be the result of things other than just quality of care). So if you want to use outcomes as a measure of quality, you need to be sure that it is actually a good indicator of quality (and not biased by something else)!
when comparing differences in outcomes, we need to adjust for prognosis (otherwise poorer outcomes might be due to the patients in that group being more sick, rather than getting less effective/poorer care)
however, there are 2 “risk adjustment facilities”:
1. overadjustment: when “the quality related factor [is] associated with the risk factor so adjustment obscures real differences” – e.g., if age is a risk factor for death and older people are given minimal care compared to young people
2. under adjustment: when we don’t have sufficient data to actually adjust for all the “relevant prognostic variables”; arguably the more frequent of the two
“identifiable processes are one of many factors that affect outcome: the signal (outcome due to process) cannot be distinguished from the noise (outcome due to other factors)”
some will argue that we should measure quality based on outcome, but “maximising […] outcomes cannot be achieved by misattributing cause and effect”
“Not only does a system of punishment and reward based on outcome run a high risk of penalising and favouring the wrong providers, it also has little potential to improve health.”
- if you identify a clinician or organization that is an outlier (e.g., higher rates of poor outcome than other clinicians/organizations), you can only identify a few, but if you identify an inadequate process, you can “shift the whole performance curve” (including even making the best ones better than they currently are)
- [I think this may be an explanation of a key difference between evaluation and quality improvement (QI), which are oftentimes difficult to differentiate. QI focuses on improving existing processes, tries to “shift the performance curve”. Process evaluation (which I think is the type of evaluation that is most similar to QI) often focuses on if activities are being implemented as intended, and why or why not. There’s definitely overlap here, of course.]
Another benefit of measuring processes is that sample sizes needed to demonstrate effectiveness are small than for outcome measures.
- errors that cause severe harm are (thankfully) rare, so though they are generally easy to measure, they they don’t affect errors rates much
- more common errors don’t cause much (if any) harm, but thus are not easily noticed/not commonly reported
3 reasons to measure process (i.e., clinical quality/active error rates):
- clinical outcomes are not a good reflection of quality of care
- allow you to make bigger gains by shifting the whole performance curve (vs. outcomes which tends to focus on outliers
- errors are much more common than adverse events

Active Errors

active errors = “errors in patient care itself rather than in the system that may predispose to such errors”
since we are interested comparing (i.e., making inferences), we are interesting in measuring rates
reporting systems for errors do not give rates – they just give amounts (i.e., the numerator – but you don’t know the denominator)
errors rates can be measured by looking at:
- documentary (including electronic) data
  - retrospective: looking back at charts
  - prospective: completing a pro forma at the time
- observations:
  - real time
  - retrospectively (video)
two methods for assessing quality of care:
- explicit (a.k.a. criterion-based assessment): assesses care compared to predetermined criteria
  - pro: doesn’t reply on expert judgment; “protects again bias by expressing error rates in terms of the maximum number of errors possible in a data set”
  - con: misses out on diversity of errors that aren’t in the algorithm
- implicit (a.k.a., holistic judgement): based on expert judgment, not constrained by predetermined criteria
  - pro: can pick up more diversity of errors that aren’t in the algorithm
  - con: poorly standardized; expensive (requires a lot of time and skill)
“We hypothesize that explicit measurement of predefined error will be much more reliable than implicit assessment, but that it will miss more errors”

Bias in Measurement

measuring errors (as opposed to just adverse events) – still subject to case mix “because different patients have different opportunities for error” – two approaches to this problem:
- express errors as % of “opportunities for error” (rather than error per patient, since some patients represent more opportunities for error than others)
  - however, this requires us to come up with a determining of what is an “opportunity for error” beforehand
- express errors as a % of patients (or patient days) with statistical adjustment for case mix (though, as discussed above, this isn’t perfect)
information bias = “the diligence with which information is recorded may influence the “visibility” of errors” – e.g., someone who is more diligent about recording information in the chart may appear to have more errors when you do a chart review than someone who is not as diligent
- “a particular type of information bias arises when an intervention designed to reduce error interacts with the measurement method. e.g., computer systems designed to improve care may affect the recording of information in case notes and hence the proportion of errors that are detected” by chart audit.
observer bias: can be difficult/expensive to make notes to enable blind measurement; a practical method to mitigate is to blind observers to the hypothesis being tested

Sensitivity & Specificity

many errors are not reporting (when using error reporting systems) not captured in the patient chart (when doing chart audits), so these are not sensitive methods (i.e., they miss a lot of errors)
prospective methods are more sensitive (i.e., picks up more errors), but are subject to bias (e.g., if you are asking clinicians to fill out a pro forma on errors before and after an intervention, then the clinicians are “both the subject of the change and observers of the effect of that change”
“unobtrusive direct observations made with appropriate consent by third party observers blind to the hypothesis being tested” is probably ideal, but it is very expensive!

More Research Is Needed

new methods to measure errors that are sensitive and specific need to be developed
perhaps there could be a way to combine scores from different methods to create a composite score? More research is needed!
“the study of measurement of error is in its infancy”

Lilford, R.J., Mohammed, M.A., Braunholtz, D., Hofer, T.P. (2003). The measurement of active errors: methodological issues. Qual Saf Health Care. 12:ii8-ii12 [Full-text]

Measuring Errors and Adverse Events in Health Care

this paper reviews the pros and cons of 8 different methods of measuring errors & adverse events and suggests a model for choosing which one(s) use in a given situation
error = includes “mistakes, close calls, near misses, active errors, and latent errors”; do not necessarily harm patient
adverse events = includes “terms that usually imply patient harm, such as medical injury and iatrogenic injury”; harms patient
latent errors = “include system defects such as poor design, incorrect installation, faulty maintenance, poor purchasing decision, and inadequate staffing.”; “difficult to measure because they occur over broad ranges of time and space and they may exist for days, months, or even years before they lead to a more apparent error or adverse event directly related to patient care”
active errors = “occur at the level of the frontline provider […] and are easier to measure because they are limited in time and space”

Method	Pros	Cons	Notes
Morbidity & Mortality conferences & autopsies	can suggest latent errors	cannot provide error rates (too few/nonstandard examples) reporting bias hindsight bias
Malpractice claims analysis	can suggest latent errors many perspectives (e.g,. patients, providers, lawyers)	cannot provide error rates (nonstandard examples) reporting bias hindsight bias
Error reporting systems	can suggest latent errors many perspectives over time can be part of routine operations	underreporting (e.g., people afraid to report, people too busy to report, people don’t notice an error occurred) hindsight bias reporting bias
Administrative data analysis	data readily available inexpensive	data may be incomplete/inaccurate data separated from clinical context
Chart review	data readily available	data incomplete (not all errors/AEs in chart) judgements about AEs not reliable hindsight bias expensive	note: this paper written before Global Trigger Tool was published
Electronic Health Record review	data readily available integrates multiple data sources real-time monitoring inexpensive (after you set it up initially)	data incomplete (not all errors/AEs in chart) expensive to set up not useful for detecting latent errors
Observation of patient care	accurate & precise more comprehensive than other methods for measuring active	expensive & time consuming requires lots of training concerns about confidentiality Hawthorne effect hindsight bias not good for detecting latent errors
Clinical surveillance	accurate & precise	expensive not good for detecting latent errors

hindsight bias – we are influenced by knowing the outcome (e.g., if we know the patient died, we are more likely to say that there were errors/issues with quality of care (even if the care given was identical to another case where the patient didn’t die)
reporting bias –
Hawthorne effect – people change their behaviour when they know they are being watched (so what you observe is not what would have happened if there were no observer present)
their model for choosing which method to use is based on the idea that the methods “exist on a continuum that illustrates the relative utility of each method for measuring latent errors as compared with active errors and adverse events”

Thomas, E.J., & Petersen, L.A. (2003). Measuring Errors and Adverse Events in Health Care. J Gen Intern Med: 18:61-67.

Posted in healthcare, notes | Tagged adverse events, errors, healthcare, measuring | Leave a comment

American Evalaution Association (AEA) Conference Sessions – Exemplary Evaluation

Posted on November 14, 2015 by Beth

This year, the American Evaluation Association live streamed a bunch of their conference sessions for free! I didn’t get to watch as many as I would have liked ¹Work meetings got in the way of some of them, some of the sessions I wanted to attend ended up not being live streamed because presenters decided they didn’t want to be live streamed after all, and some of the sessions started at 5 am this morning and though I did wake up, I didn’t manage to stay awake enough to pay attention. but I did see a few and here are some summary notes that I took from the sessions.

Jane Davidson

distinguished between non-evaluative vs. evaluative questions

Non-evaluative questions	Evaluative questions
How many people received the program?	How good was the program reach? (e.g., did it reach the peopel it should have? did it reach underrepresented people?)
Was the program implmeneted as intended?	How well was the program implemented? (Shouldn’t just assume the plan was good and that implementing the plan as intended is the best thing to do) (How well was the program contextualized? Did the program adapt appropriately in response to what occurred as the program continued)
What effect did the program have on its participants?	How substantial and valuable were the effects on participants?

The “non-evaluative questions” are merely describing factual evidence, but evidence alone cannot answer an evaluative question. (If you are asking the non-evaluative questions, then you are doing empirical research, not evaluation)
Indicators don’t give you answers to evaluative questions

non-evaluative facts + definitions of quality and value –> evaluative conclusions

“Evaluative rubtics paint a picture of what the evidence should look like a differnet levels of performance”
What will the constellation of evidence looks like for exemplary, good, or bad results?
creation of rubrics is generally done in a participatory way – stakeholder brings their various types of expertise (content, politics, experience, etc.), evaluator brings evaluation expertise, evaluator unpacks the ideas from the stakeholders and guides them to make the rubric

Exemplary Uses of Theory in Evaluation Practice

theories of programs
theories of evaluation
reductionism (transdisciplinary) – breaking system down to components
- e.g., goal attainment : intervention –> outcome (and evalution focuses on determining if intervention causes outcome; “experimental evalution approach”
  - pros: rigour of evaluation; scientific reputation
  - cons: neglect context, assumes efficacy = effectiveness (in real world)
systems thinking (transdisciplinary) “viewing the sitaution holistically, as opposed to reductionistically”
- pros: better explains how a program works, accounts for synergeies/emergent behvaiours
- cons: information overload, difficulties in data analysis
pragmatic synthesis:
- the above two are extremes of theoretical spectrum
- most real world programs are in the middle
- middle-ground programs differ in complexity from both reudctionism and system thinking:
- theories of program: action model/change model (in Chen’s book)
Discussant: Mel Mark: makes sense to be “multi-lingual” when it comes to evaluation theory – understand a variety of them and ask what makes senes in a given case

Closing Session

The closing session was a series of speakers summarizing their key learnings. Here are the ones that jumped out at me:

the importance of evaluating what policies actually do and not just their rhetoric (noted that programs and policies can allow racism to continue and even promote racism) and the importance of being away of our own biases
evaluation can have an impact because:
- it improves the program
- it has an impact itself
when a program has a weak theory, often evaluation becomes an intervention, but when a program has a strong theory, evaluation serves more as a facilitation
when you do an evaluation, do you position your work within the context of the evaluation community?
- evaluation standards
- evaluator competencies
- cultural competencies
MQ Patton – when he introduces himself in the context of his work, he doesn’t say “I’m an evaluator” or talk just about his skills, but that “I am a member of an international evaluation community”
International Year of Evaluation has allowed a platform for promoting evaluation as a “global force for good”
“Act as if what you do matters. It does” – William James

Footnotes[+]

Footnotes
↑1	Work meetings got in the way of some of them, some of the sessions I wanted to attend ended up not being live streamed because presenters decided they didn’t want to be live streamed after all, and some of the sessions started at 5 am this morning and though I did wake up, I didn’t manage to stay awake enough to pay attention.

Posted in evaluation, event notes, notes | Tagged AEA, American Evaluation Association, conference notes, Eval15, evaluation, notes | Leave a comment

value
respect for people and culture	flow	innovation	relentless improvement
leadership

Applying complexity theory: A review to inform evaluation design

Perturbing ongoing conversations about systems and complexity in health services and systems

Theory-based Evaluation and Types of Complexity

Complex, but not quite complex enough: The turn to the complexity sciences in evaluation scholarship

References

Martin, C.M., & Sturmberg, J. P. (2009). Perturbing ongoing conversations about systems and complexity in health services and systems. Journal of Evaluation in Clinical Practice. 15: 549-552.

Mowles, C. (2014). Complex, but not quite complex enough: The turn to the complexity sciences in evaluation scholarship. Evaluation. 20(2): 160-75.

Stame, N. (2004). Theory-based Evaluation and Types of Complexity. Evaluation. 10(1): 58-76

Walton, M. (2014) Applying complexity theory: A review to inform evaluation design. Evaluation and Program Planning. 45: 119-126.

References

Gerrits, L. & Verweij, S. (2015). Taking stock of complexity in evaluation: A discussion of three recent publications. Evaluation. 21(4): 481-91.

Ling, T. (2012). Evaluating complex and unfolding interventions in real time. Evaluation. 18(1): 79-91.

Rogers, P. (2008). Using Programme Theory to Evaluate Complicated and Complex Aspects of Interventions. Evaluation 14(1): 29-48.

Stirling, A. (2010). Keep it complex. Nature. 468. p. 1029-1031.

References:

Begun, J. W., Zimmerman, B., & Dooley, K. (2003). Health care organizations as complex adaptive systems. In Advances in Health Care Organization Theory. Eds. S.M. Mick & M. Wyttenbach. San Francisco: Jossey-Bass, pp. 253-288.

Traditional Waterfall vs. Agile:

Waterfall:

Requirements

[documents]

Design

[documents]

Implementation

[unverified system]

Verification

[system]

Agile:

sprint

[working code]

sprint

[working code]

sprint

[working code]

sprint

[working code]

sprint

[working code]

sprint

[working code]

sprint

[working code]

sprint

[working code]

Area

Guiding Questions

Answers to Guiding Questions

Implications for Evaluation

General phenomenon/problem

What is the problem the program is addressing?

How did it emerge? How long has it existed?

What groups prompted concern about it?

What is already known about it?

What are the dominant methods used for understanding the phenomenon/problem?

What tools exist for measuring change?

Intervention

Where is the program in its life cycle?

How is the program structured?

What are the different components and how do they fit in the broader environment?

Who does the program serve?

What are their characteristics, beliefs, culture, needs, and desired outcomes?

Broader environment around the intervention

What are the different layers of environment the intervention that affect and can be affected by the intervention?

What aspects of these different climates are affecting the design and operation of the program?

What are important historical, social, and cultural elements of the community in which the program is conducted?

Are there political or social views that affect perspectives on the program, its clients, or decision makers?

Parameters of the evaluation

What are the primary and secondary evaluation questions and their implications for possible methodology and design choices?

What resources are available to support the evaluation (e.g., budget, time frame, local evaluation capacity, evaluation ethos)?

Decision-making arena

Who are the main decision makers/users of the evaluation information?

What are their views, values, and history about the program, and about evaluation?

What is the larger political culture in which they work?

What are the expectations of their organization?

What are the expectations of citizens they serve regarding government programs, and about evaluation?

What are the political expectations for evaluation?

Area

Guiding Questions

Answers to Guiding Questions

Implications for Evaluation

General phenomenon/problem

10 Plus Ways to Stretch Your Evaluation Budget
-“I want to do myself out of a job” – I want to give evaluation away!