Recap of the Canadian Evaluation Society’s 2017 national conference

The Canadian Evaluation Society’s national conference was held right here in Vancouver last month! I was one of the program co-chairs for the conference and I have to say that it was pretty awesome to see a year and a half worth’s of work by the organizing committee come to fruition! There were a lot of people involved in putting together the conference and so many more parts to it than I had realized when I started working on it and it was incredible to see everything work so smoothly!

As I usually do at conferences, I took a tonne of notes, but for this blog posting I’m going to summarize some of my insights, by topic (in alphabetical order) rather than by session 1Though I’ve listed all the sessions I attended at the bottom of this posting. as I went to some different sessions that covered similar things. Where possible, I’ve included the names of people who said the brilliant things that I took note of, because I think it is important to give credit where credit is due, but I apologize in advance if my paraphrasing of what people said is not as elegant as the way that people actually said them.

Context

  • Damien Contandriopoulos noted that context is often defined by what it is not – it is not your intervention – i.e., it’s whatever is outside your intervention, but it’s not the entire universe outside of your intervention. Just what is close enough to be relevant/important to the analysis. He also noted that some disciplines don’t talk about context at all (e.g., they might talk about the culture in which an intervention occurs, but don’t talk about it as separate from the intervention the way we talk about context as being separate from the intervention).
  • Depending on your conceptualization of “context”, you may want to:
    • neutralize the context (e.g., those who think that context “gets in the way” and thus they try to measure it and neutralize it so it won’t “interfere” with your results). Contandriopoulos clearly didn’t favour this approach, but noted that it could work if your evaluand was very concrete/clear.
    • adapt to context
    • describe the context
  • In all of the above options, it’s about generalizability/external validity (e.g., if you are trying to neutralize the context, you are wanting to know if the evaluand works and don’t want the context to interfere with your conclusion about if the the evaluand works; if you are adapting to the context, you want to figure out how the evaluand might work in a given context; if you are describing the context, you are wanting to understand the context to use to interpret your evaluation findings)
  • From the audience, AEA president Kathryn Newcomer, mentioned a paper by Nancy Cartwright about transferability of findings 2She didn’t say the name of the paper or the journal, but based on her comments about the paper, I believe it is likely this paper. Unfortunately, it’s behind a paywall, so I can’t read more than the abstract., specifically about how Cartwright talks about “support factors” rather than context. Further, she talked about how in the US there is lots of interesting in “scaling up” interventions, but rarely do studies document the support factors that allow an intervention to work (e.g., you need to have a pool of highly qualified teachers in the area for program X to work). She suggested:
    • putting the support factors into the theory of change
    • considering: how do we know if the support factors are necessary or sufficient? What if you need a combination of factors that need to be present at the same time and in certain amounts for the program to work? etc.
  • Contandriopoulos mentioned that sometimes people just list “facilitators” and “barriers” as if that’s enough [but I liked Newcomer’s suggestion that “support factors” (or barriers, though she didn’t mention it) could be integrated into the theory of change]

Evaluation

  • Kas Aruskevich showed an imagine of a river in Alaska viewed from above and noted that if you were standing by the side of that river, you’d never know what the sources of that river are (as they are blocked by mountains) and she likened evaluation to taking that perspective from a distance where you look at the whole picture. I liked this analogy.
  • Kathy Robrigado talked about how the accountability function of evaluation is often seen as an antagonist to learning, but she sees it as a jumping off point for learning.
  • In summarizing the Leading Edge panel, E. Jane Davidson had a few things to say that were very insightful in relation to thinking I’ve been doing lately with my team about what evaluation is (and how it compares/relates to other disciplines that aim to assess program/projects/etc.). With respect to monitoring, she noted that people often expect key performance indicators (KPIs) to be an answer, but they aren’t. Often what’s the easiest to measure is not what’s most important. In evaluation, we need to think about what’s most important (not just what’s strong or weak, but what really matters).

Evaluation, History of the Field

  • Every time I go to a evaluation conference, someone gives a bit of a history of the field of evaluation from their perspective (perhaps once day I’ll compile them all into a timeline). This conference was no different, with closing keynote speaker Kylie Hutchison talking about what she has seen as “innovations” in evaluation that had a lot of buzz around them and then eventually settled into an appropriate place [her description made me think of the “hype cycle“, which someone had coincidentally shown in one of the sessions that I was in]:
    • 1990s – logic models
    • 2000s – the big RCT debate (i.e., are RCTs really the “best” way to evaluate in all class)
    • social return on investment (SROI), Appreciative Inquiry
    • developmental evaluation, systems approaches
    • deliverology

Evaluators, Role of

  • Lyn Shulha noted that as an evaluator, you’ll never have the same context/working conditions from one evaluation to the next, and you’ll never have a “final” practice or theory – they will continue to change.
  • Kathy Robrigado talked about starting an evaluation as an “evaluator as critical friend” (e.g., asking provocative questions to understand the program/context, offering critiques of a person’s work, providing data to be examined through another lens). But after awhile, they found this approach to be too resource intensive, as they had ~60 programs to deal with and data collection was cumbersome; they moved from critical friend to “strategic acquaintance” (or, as she put it, “we had to friendzone the programs”)
  • Michel Laurendeau stated that “evaluators are the experts in interpreting monitoring data” as what you see when you look at the data isn’t necessarily what is really going on [this reminded me of something that was discussed at last year’s CES conference: what the data says vs. what the data means]
  • Kylie Hutchison talked about how many evaluators are talking about the evaluator as a social change agent. People gravitate to this profession because they want to be involved in social change – maybe they are a data geek, but they see how the data can lead to social change. She also talked about how many skills she has needed to build to support her evaluation practice: in grad school she focused on methods and statistics, but when she went on to become a consultant she didn’t find that she needed advanced statistics – she needed skills in facilitation, then data visualization, and now organizational development.

Knowledge Translation

  • Kim van der Woerd described getting knowledge into action as “the long journey from the head to the heart”. I really like this phrase, as just knowing something (with the head) doesn’t necessarily mean we take it to heart and put it into action. I wonder how thinking about how we can get things from the head to the heart could help us think about better ways to promote the translation of knowledge into action.

Learning

  • Lyn Shulha talked about learning spirals – as we travel from novice to expert, we can imagine ourselves descending down, say, a spiral staircase. As a given point, we can be at the same place as earlier, but deeper (as well, we are changed from when we were last at this point). She noted that we “need to hold onto our experiences and our truths lightly”, lest we end up traveling linearly rather than in a spiral.

Logic Models

  • One of the sessions I was in generated an interesting discussion about different ways that people use logic models, such as:
    • having the lead agency of a program create a logic model of how they think the program works and then having all the agencies operating the program create logic models of how they think the program works and then compare – if they have different views of how the program works, this can generate important discussions
    • calling the first version of the logic model “strawman #1” to emphasize that the logic model is meant to be challenging and changed.

Reporting

  • Report structure recommended by Julian King in the Leading Edge panel on Rubrics:
    • answer the evaluation question
    • key evidence & reasoning behind how you came up with the answer
    • extra information
      • They summarized this as spoiler, evidence, discussion, repeatUntitled

        Untitled

  • E. Jane Davidson noted that in social sciences, people are often taught how to break things down, but not how to pack it back together again to answer the big picture question. For example, you’ll often see people report the quantitative results, then the qualitative results, but with no actual mixing of the data (so it’s not really “mixed methods” – it’s more just “both methods”).
  • Also from E. Jane Davidson – the length of a section of a report is typically proportional to how long it took you to do the work (which is why literature reviews are so long), but that’s not what’s most useful to the reader. It’s like we feel we have to put the reader through the same pain we went through to do the work; we want them to know we did so much work! And then they get to the end and we say “the results may or may not be…. and more research is needed.” Not helpful! Spoilers really are key in evaluation reporting – write it like a headline. Pique their interest in the spoiler and then they want to read the evidence (how did they decide that??
    • 7 +/- 2 key evaluation questions (KEQ):
      • executive summary: KEQ 1, answer + brief evidence; KEQ 2, answer + brief evidence; KEQ 3, answer + brief evidence
      • and make sure your recommendations are actionable!

Rubrics

  • The Leading Edge Panel on Rubrics was easily my favourite session of the conference. I’ve done a bit of reading about rubrics after going to a session on them at the Australasian Evaluation Society conference in Perth, but found that this panel really brought the ideas to life for me.
  • Kate McKegg mentioned that she asked a group of people in healthcare if they thought that their organizations key performance indicators (KPIs) reflected the value of what their organization does, and not a single person raised their hand [This resonated with me, as my team and I have been doing a lot of work lately on differentiating, among other thing monitoring and evaluation.]
  • Rubrics:
    • can help clarify what matters and include those things in your evaluation
    • are made of:
      • evaluative criteria – to come up with these, can check out the literature, talk to experts, talk to stakeholders (e.g., people on the front lines); can also think about what would be appropriate for the cultural context (e.g., what would make a program excellent in light of the cultural context?)
      • levels of importance (of the criteria) – remember, things that are easy to measure are not necessarily what’s important
      • rating scale (how to determine the level of performance (e.g., excellent-very good-good-adequate-emerging-not yet emerging-poor); depending on your context, you may choose different words (e.g., may use “thriving” instead of “excellent”)
    • can be:
      • analytic – describe the various performance levels for each criterion
      • holistic – a broad level of description of performance at each level (e.g., describe “excellent” overall (encompassing all the criteria) rather than describing “excellent” for each criterion individually)
    • analytic can provide more clarity, but require more data
  • You should be able to see your theory of change in the rubric. Key evaluation questions (KEQ) often follow the theory of change (e.g., KEQs might be “how well are we implementing?” or “how well are we achieving outcome #1?” Think about the causal links in the theory of change. If there is a deal breaker, it should show up in the theory of change.). Think about the causal links and their strength.
  • You can embed cultural values into the process (e.g., for the Maori, the word “rubric” didn’t resonate, so Nan Wehipeihana used a cultural metaphor that did; rather than words like “poor” and “excellent”, can use words that fit better like a “seed with latent potential” and “blooming” and “coming to fruition”)
  • Values are the basis for criteria – they reflect what is valued (and whose values hold sway matters)
  • Once you have a rubric, you need to collect data to “grade” the program using the rubric; data may come from all sorts of places (e.g., previous research, administrative data, photos from the program, interviews/surveys/focus groups)
  • Can make a table of each criteria and data source and use that to optimize your data collection:
Admin Data Interview Staff Interview Participants Photos from the Program
Criterion 1  x  x
 Criterion 2  x x x
 Criterion 3  x  x
 Criterion 4 x  x
 Criterion 5  x x x
  • Then you can look at all the things you want to collect from each data source (e.g., you can ask about criteria 2, 4, and 5 in interviews with staff; look for criteria 1, 2, and 3 in the photos from the program) = integrated data collection
  • Make sure that the data collection is designed to answer the evaluation questions.
  • Look to see if you are getting consistent information (i.e., saturation) or if the data is patchy or inconsistent and you need to get more clarity.
  • Bring data to stakeholders as you go along (especially for long evaluations – they don’t want to wait until the end of 3 years to find out how things are going!)
  • 3 steps to making sense of data:
    • analysis – breaking something down into its component parts and examining each part separately (King et al, 2013)
    • synthesis – putting together “a complex whole made up of a number of parts or elements ” (OED online); assembling the different sources of data. Sometimes when you are working on data synthesis, you learn that what’s important isn’t what you initially thought was important (so you need to rejig your rubric). Also think about what the deal breakers are (e.g., if no one shows up to the program…)
    • sensemaking: helps to clarify things; one way to do this is to get all the stakeholders together, give them the synthesized data (a rough cut), and go through a process like this:
      • generalization: In general, I noticed…
      • exception: In general…, except….
      • contradiction: On one hand…, but ont he other hand…
      • surprise: I was surprised by…
      • puzzle: I wonder…
    • When you think about the exceptions or contradictions – how big of a deal are they? Are they deal breakers?
    • As stakeholders do this, they start to understand the data and to own the evaluation. Often they make harder judgments than the evaluator might have.
    • Typically, they do the synthesis and bring that to the stakeholders to do sensemaking; but don’t spend a lot of time making the synthesized data looked polished/finished – it should look rough as it is to be worked with. Not everyone will spend time reading the data synthesis in advance, so give them time to do that at the start of the session.
    • Put up the rubric and have the stakeholders grade the program.
    • Often people try to do analysis, synthesis, and sensemaking all at the same time, but you should do them separately.
  • Rubrics “aren’t just a method – they change the whole fabric of your evaluation”. They can help you “mix” methods (rather than just doing “both”) methods – they can help you make sense of the “constellation of evidence”).
  • I asked how do they deal with situations that are dynamic? Their answer was the rubrics can evolve, especially with an innovative program. You create it based on what you imagine the outcome will be, but other things can emerge from the program. You can start with a high level rubric (don’t want to get too detailed or overspecified that you paint yourself into a corner). You need it to be underspecified enough to be able to contextualize it to the setting. It’s like the concept of “implementation fidelity” – implementing something exactly as specific is not the best – you should be implementing enough of the intent in a way that will work in the setting.
  • Another audience member asked how would you determine if a rubric is valid/reliable? The speakers noted that often people ask “is it a valid tool?” meaning “was it compared to a gold standard /previously validated tool”? But those other tools are often too narrow/miss the mark. The speakers suggested that “construct validity is the mother of all validities” – the most important question is “is it useful for the people for whom it was built?”
  • Another audience member asked about “scaling up” rubrics. The speakers noted examples where they had worked on projects to create rubrics to be used across a broader group than those who created it – e.g., created by the Ministry of Education to be used by many different schools with the help of a facilitator. For these, you need to have a lot more detail/instructions on how to use it (and a good facilitator) since users won’t have the shared understanding that comes from having created it. They have also done “skinny rubrics” to be used by lots of different types of schools (so had to be underspecified), but again, need to provide lots of support to users.

Systems Thinking

  • Systems archetypes are common patterns that emerge in systems. This was a concept that was brought up by an audience member in my session on complexity, and is something I want to read more about!
  • Heather Codd talked about three key concepts in using systems thinking (using Donella Meadow’s definition of a system as something with parts, links between parts, and a boundary) in evaluation:
    • interrelationships – understanding the interrelationships and what drives them helps us to understand what’s going on with the program (and she suggested using rich pictures to help focus the evaluation and think about what the consequences of the program might be)
    • boundaries – we need to pick a boundary for the purpose of analysis, but note that it is sensitive because it defines what is in and out of the evaluation. She suggested using critical system heuristics to help describe the program, scope the evaluation, and decide on an evaluation approach)”
      Critical systems heuristic slide
    • multiple perspectives – what are the world views being applied and what the implications of those world views? She suggested you can do a stakeholder analysis, but also a stake analysis; she also suggested “framing” by using an idea from Bob Williams, where you add the words “something to do with…” in front of ideas (e.g., “Something to do with a culture of health”, “something to do with managing heart disease”; this tool can help give you a sense of the intervention’s purpose and the evaluation’s purpose.
    • Evaluators are an element in a system and we cannot separate out our effect on the systems [This made me think of “co-evolution” – the evaluation co-evolves along with the rest of the system]
    • There are echoes in a system of what has happened before [e.g., intergenerational trauma]

Truth & Reconciliation

  • Last year, the CES took a position on reconciliation in Canada. Several of the speakers at the conference talked about this topic. For example, Kim van der Woerd talked about a witness as being one who listens with their whole heart and validates a message by sharing it (and that they have a responsibility to share it). She also noted that the Truth and Reconciliation Commission (TRC) wasn’t Canada’s first attempt at trying to build a good relationship between Aboriginal and non-Aboriginal people – the Royal Commission on Aboriginal People put out a report with recommendations in 1996. But when it was evaluated in 2006, Canada received a failing grade with 76% of the 400+ recommendations being not done and with no significant process. She noted that we shouldn’t wait 10 years before we evaluate how well Canada is doing on the TRC recommendations.
  • Paul Lacerte outlined a set of recommendations:
    • amplify the new narrative (where the old narrative was “the federal government takes care of the natives”)
    • conduct research & develop a reconciliation framework
    • set targets for recruiting and training indigenous evaluators
    • learn about and follow protocol (e.g., how to start a meeting, gift giving)
    • put up a sign in your workspace about the traditional territory on which you are working
    • volunteer for an indigenous non-profit
    • join the Moose Hide Campaign
  • At the start of her closing keynote, Kylie Hutchison acknowledge that she was speaking on the unceded traditional territory of the Musqueam, Squamish, and Tsleil-Waututh First Nations. And then she said that she’d never said that before speaking before but that she would be now. And I thought that it was a really cool think to witness someone learning something new and putting it into practice like that, especially something so meaningful.

Misc:

  • The best joke I heard in a presentation was when Kathy Robrigado, after a few acronym-filled sentences in her presentation, said, “As you know, government employees are paid by the number of acronyms they use”

To Dos:

Sessions I Attended:

  • Opening Keynote by Kim van der Woerd and Paul Lacerte
  • Short presentation: Causing Chaos: Complexity, theory of change, and developmental evaluation in an innovation institute by Darly Dash, Hilary Dunn, Susan Brown, Tanya Darisi, Celia Laur Cypress
  • Short presentation: Implications of complexity thinking on planning an evaluation of a system transformation by M. Elizabeth Snow, Joyce Cheng [This was one of my own presentations!]
  • Short presentation: Cycles of Learning: Considering the Process and Product of the Canadian Journal of Program Evaluation Special Issue by Michelle Searle, Cheryl Poth, Jennifer Greene, Lyn Shulha
  • Short presentation: Using System Mapping as an Evaluation Tool for Sustainability by Kas Aruskevich
  • Incorporating influence beyond academia data into performance measurement and evaluation projects by Christopher Manuel
  • Exploring Innovative Methods for Monitoring Access to Justice Indicators by Yvon Dandurand, Jessica Jahn
  • A Quasi-Experimental, Longitudinal Study of the Effects of Primary School Readiness Interventions by Andres Gouldsborough
  • What Would Happen If…? A Reflection on Methodological Choices for a Gendered Program by Jane Whynot, Amanda McIntyre, Janice Remai
  • Towards Strategic Accountability: From Programs to Systems by Kathy Robrigado
  • Getting comfortable with complexity: a network analysis approach to program logic and evaluation design by John Burrett
  • Communication in System Level Initiatives: A grounded theory study by Dorothy Pinto
  • Seeing the Bigger Picture: How to Integrate Systems Thinking Approaches into Evaluation Practice by Heather Codd
  • Understanding and Measuring Context: What? Why? and How? by Damien Contandriopoulos
  • A Graphic Designer, an Evaluator, and a Computer Scientist Walk into a Bar: Interdisciplinary for Innovation by M. Elizabeth Snow, Nancy Snow, Daniel J. Gillis [This was another one of my presentations and hands down the best presentation title I’ve ever had]
  • Big Bang, or Big Bust? The Role of Theory and Causation in the Big Data Revolution by Sebastian Lemire, Steffen Bohni Nielsen Seymour
  • Using Web Analytics for Program Evaluation – New Tools for Evaluating Government Services in the Digital Age at Economic and Social Development Canada by Lisa Comeau, Alejandro Pachon
  • The Future of Evaluation: Micro-Databases by Michel Laurendeau
  • Dylomo: Case studies from an online tool for developing interactive logic models by M. Elizabeth Snow, Nancy Snow [This was the last of my presentations]
  • Development and use of an App for Collecting Data: The Facility Engagement Initiative by Neale Smith, Graham Shaw, Chris Lovato, Craig Mitton, Jean-Louis Denis
  • Leading Edge Panel: Evaluative Rubrics – Delivering well-reasoned answers to real evaluative questions by Kate McKegg, Nan  Wehipeihana, Judy Oakden, Julian King, E Jane Davidson
  • Closing Keynote by Kylie Hutchinson

Next CES Conference:

  • Host: Alberta & Northwest Territory confernece
  • May 26-29 – Calgary
  • May 31-June 1 – Yellowknife
  • Theme: Co-creation

Footnotes   [ + ]

1. Though I’ve listed all the sessions I attended at the bottom of this posting.
2. She didn’t say the name of the paper or the journal, but based on her comments about the paper, I believe it is likely this paper. Unfortunately, it’s behind a paywall, so I can’t read more than the abstract.
Posted in evaluation, event notes, notes | Tagged , , , , , , , , , | Leave a comment

On Flexibility in Evaluation Design

Been doing some reading as I work on developing an evaluation plan for a complex program that will be implemented at many sites. Here are some notes from a few papers that I’ve read – I think if anything links these three together, it is the notion of the need to be flexible when designing an evaluation – but you also need to think about how you’ll maintain the rigour of your work.

Wandersman et al (2016)’s paper on using an evaluation approach called “Getting to Outcomes (GTO)” discussed the notion that just because an intervention has been shown to be effective in one setting does not necessarily mean it will work in other settings. While I wasn’t interesting in the GTO approach per se, I found their introduction insightful.

Some notes I took from the paper:

  • the rationale for using evidence-based interventions is that since research studies show that a given intervention leads to positive outcomes, then if we take that intervention and implement it in the same way it was implemented in the research studies (i.e., fidelity to the intervention) on a broad scale (i.e., at many sites), then we should see those same positive outcomes on a broad scale
  • however, when this is actually done, evaluations often show that the positive outcomes compared to control sites don’t happen or that positive outcomes happen on average, but there is much variability among the sites such that some sites get the positive outcomes and others don’t (or even that some sites get negative outcomes)
  • from the perspective of each individual site, having positive outcomes on average (but not at their own particular site) is not good enough to say that this intervention “works”
  • when you implement complex programs at multi-sites/multi-levels, you “need to accommodate for the contexts of the sites, organizations, or individuals and the complete hierarchies that exist among these entities […] the complexity […]” includes multiple targets of change and settings” (p. 549-50)
  • recommendations:
    • evaluate interventions at each site in which it is implemented
    • examine the quality of the implementation
    • consider the fit of the intervention to the local context
      • “the important question is whether they are doing what they need to do in their own setting in order to be successful” (p. 547)
      • “the relevant evaluation question to be answered at scale is not “does the [evidence-based intervention] result in outcomes?” but rather “how do we achieve outcomes in each setting?” (p. 547)
    • evaluators should “assist program implementers to adapt and tailor programs to meet local needs and provide ongoing feedback to support program implementation” (p. 548)
  • empowerment evaluation: premise is: “if key stakeholders (including program staff and consumers) have the capacity to use the logic and tools of evaluation for planning more systematically, implementing with quality, self-evaluating, and using the information for continuous quality improvement, then they will be more likely to achieve their desired outcomes”

Balasubramanian et al (2015) discussed what they call “Learning Evaluation”, which they see as a blend of quality improvement and implementation research. To me it sounded similar to Developmental Evaluation (DE). For example, they state that:

  • “Two key aspects of this approach set it apart from other evaluation approaches; its emphasis on facilitating learning from small, rapid cycles of change within organizations and on capturing contextual and explanatory factors related to implementation and their effect on outcomes across organizations”  (p. 2 of 11)
  • “assessment needs to be flexible, grounded, iterative, contextualized, and participatory in order to foster rapid and transportable knowledge. This approach integrates the implementation and evaluation of interventions by establishing feedback loops that allow the intervention to adapt to ongoing contextual changes.” (p. 2 of 11)

That sound a lot like DE to me. And it sounds a lot like how I’m looking to approach the evaluation I’m currently planning.

Principles underlying the “Learning Evaluation” approach (from page 3 of 11):

 Principle Why
 1. Gather data to describe the types of changes made by healthcare organizations, how changes are implemented, and the evolution of the change process. To establish initial conditions for implementing innovations at each site and to describe implementation changes over time.
 2. Collect process and outcome data that are relevant to healthcare organizations and to the research team To engage healthcare organizations in research and in continuous learning and quality improvement.
 3. Assess multi-level contextual factors that affect implementation, process, outcome, and transportability. Contextual factors influence quality improvement: need to evaluate conditions under which innovations may or may not result in anticipated outcomes.
 4. Assist healthcare organizations in applying data to monitor the change process and make further improvements.  To facilitate continuous quality improvement and to stimulate learning within and across organizations.
5. Operationalize common measurement and assessment strategies with the aim of generating transportable results. To conduct internally valid cross-organization mixed methods analyis

A point that was made in this paper that resonated with me was that: “Within the context of a multi-site demonstration project conducted in real-world settings, it was not feasible to randomize sites or to specify target patient samples or measures a priori.” (p. 7 of 11) Instead, they incorporated elements to enhance the study’s rigour:

  • rigour in study design
    • considered each site as a “single group pre-post quasi-experimental study”, which is subject to history 1i.e., how do you know results aren’t do to other events that are occurring concurrently with the intervention? and maturation 2i.e., how do you know the results aren’t just due to naturally occurring changes over time rather than being due to the intervention? threats to internal validity
    • to counteract these threats, they collected qualitative data on implementation events (to allow them to examine if results are related to implementation of the intervention)
    • they also used member checking to validate their findings
  • rigour in analysis
    • rather than analyzing each source of data independently, they integrated findings
    • “triangulating data sources is critical to rigor in mixed methods analysis”
    • qualitative data analysis was conducted first within a given site (e.g., “to identify factors that hindered or facilitated implementation while also paying attention to the role contextual influences played” (p. 7 of 11), then across sites.

A few other points they make:

  • “ongoing learning and adaptation of measurement allows both rigor and relevance” (p. 8 of 11)
  • by “working collaboratively with innovators to develop data collection strategies and routine processes for jointly sharing and reflecting on data to foster continuous learning, improvement, and advocacy for policy changes” the organization can “develop capacity for data collection and monitoring for future efforts” (p. 8 of 11)
  • this approach “may feel to some to be at odds with current standards of rigor, which value fidelity to a priori hypotheses and methods”, but it is “not a ‘canned’ approach to evaluating healthcare innovations, but it involves the flexible application of five general principles” (p. 9 of 11). “This requires [evaluators] to be flexible and nimble in adapting their approach when proposed innovations are modified to fit the local context.” (p. 9 of 11)

Brainard & Hunter conducted a scoping review with the question “Do complexity-informed health interventions work?” What they found was that although “the lens of complexity theory is widely advocated to improve health care delivery,” there’s not much in the literature to support the idea that using a complexity lens to design an intervention makes the intervention more effective.

They used the term “‘complexity science’ as an umbrella term for a number of closely related concepts: complex systems, complexity theory, complex adaptive systems, systemic thinking, systems approach and closely related phrases” (p. 2 of 11). They noted the following characteristics of systems:

  • “Large number of elements, known and unknown.
  • Rich, possibly nested or looping, and certainly overlapping networks, often with poorly understood relationships between elements or networks.
  • Non-linearity, cause and effect are hard to follow; unintended consequences are normal.
  • Emergence and/or self-organization: unplanned patterns or structures that arise from processes within or between elements. Not deliberate, yet tend to be self-perpetuating.
  • A tendency to easily tip towards chaos and cascading sequences of events.
  • Leverage points, where system outcomes can be most influenced, but never controlled.” (p. 2 of 11)

They also had some recommendations for reporting on/evaluating complexity-informed interventions:

  • results should be monitored over the long term (e.g., more than 12 months) as results can take a long time to occur
  • barriers to implementation should be explored/discussed
  • unintended/unanticipated (including negative) changes should be actively looked for
  • support from the institution/senior staff combined with widespread collaborative effort is needed to successfully implement
  • complexity science or related phrases should be in the title of the article

References:

Balasubramanian, B., Cohen, D.J., Davis, M.M., Gunn, R., Dickinson, L.M., Miller, W.L., Crabtree, B.F., & Stange, K.C. Learning Evaluation: blending quality improvement and implementation research methods to study healthcare innovations. Implementation Science. 10: 31. (full text)

Brinard, J., & Hunter, P.R. Do complexity-informed health interventions work? A scoping review. Implementation Science. 11:127. (full text)

Wandersman, A., Alia, K., Cook, B.S., Hsu, L.L., & Ramaswamy, R. (2016). Evidence-Based Interventions Are Necessary but Not Sufficient for Achieving Outcomes in Each Setting in a Complex World: Empowerment Evaluation, Getting To Outcomes, and Demonstrating Accountability.  American Journal of Evaluation. 37(4): 544-561. [abstract]

Footnotes   [ + ]

1. i.e., how do you know results aren’t do to other events that are occurring concurrently with the intervention?
2. i.e., how do you know the results aren’t just due to naturally occurring changes over time rather than being due to the intervention?
Posted in evaluation, healthcare, notes | Tagged , , , , , , , , | Leave a comment

Pragmatic Science

Another posting that was languishing in my drafts folder. Not sure why I didn’t published it when I wrote it, but here it is now!

  • Berwick (2009) wrote an interesting commentary called “Broadening the view of evidence-based medicine” in which he describes how “scholars in the last half of the 20th century forged our modern commitment to evidence in evaluating clinical practices” (p. 315) and though it was seen as unwelcome at the time, they brought the scientific method to bear on the clinical world, and over time, the randomized controlled trail (RCT) because the “Crown Prince of methods […] which stood second to no other method” (p. 315). And while there has been a huge amount of benefit from this, he says “we have overshot the mark. We have transformed the commitment to “evidence-based medicine” of a particular sort into an intellectual hegemony that can cost use dearly if we do not take stock and modify it” (p. 315). He points out that there are many ways of learning things:
  • “Did you learn Spanish by conducting experiments? Did you master your bicycle or your skis using randomized trials? Are you a better parent because you did a laboratory study of parenting? Of course not. And yet, do you doubt what you have learned?” (p. 315)
  • “Much of human learning relies wisely on effective approaches to problem solving, learning, growth, and development that are different from the types of formal science […and …] some of those approaches offer good defences against misinterpretation, bias, and confounding.” (p. 315).

  • He warns that limiting ourselves to only RCTs “excludes too much of the knowledge and practice that can be harvested from experience, itself, reflected upon” (p. 316)
  • “Pragmatic science” involved:
    • “tracking effects over time (rather than summarizing with stats)
    • using local knowledge in measurement
    • integrating detailed process knowledge into the work of interpretation
    • using small sample sizes and short experimental cycles to learn quickly
    • employing powerful multifactorial designs (rather than univariate ones focused on “summative” questions) ” (p. 316)
 explanatory trials  pragmatic trials
Definition
  • evaluating efficacy (how well does it work in a tightly controlled setting)
  • clinical trials that test a causal research hypothesis in an ideal setting
  • evaluating effectiveness (how well does it work in “real life”)
  • trials that help users decide between options
Validity
  • high internal validity
  • high external validity
Test sample & setting
  • focus on homogeneity
  • focus on heterogeneity
  • explanatory and pragmatic are not a dichotomy as most trials are not purely one or the other – there is a spectrum between them
  • Thorpe et al (2009) created a tool (called PRECIS) to help people designing clinical trials to distinguish where on that pragmatic-explanatory continuum their trial falls; it involves looking at 10 domains (see table below), with scores on these criteria placed on a 11 spoke wheel (to give you a spider diagram type of picture)
Criteria   explanatory trials  pragmatic trials
participant eligibility
  • strict
  • everyone with condition of interest can be enrolled
experimental intervention – flexibility
  • strict adherence to protocol
  • highly flexible; practitioners have leeway on how to apply the intervention
experimental intervention – practitioner expertise
  • narrow group, highly skilled
  • broad group of practitioners in broad range of settings
comparison group – flexibility
  • strict; may use placebo instead of “usual practice”/”best alternative”
  • “use practice”/”best alternative”, practitioner has leeway on how to apply it
comparison group – practitioner expertise
  • standardized
  • broad group of practitioners in broad range of settings
follow-up intensity
  • extensive follow-up & data collection; more than would routinely occur
  • no formal follow-up; use administrative database to collect outcome data
primary trial outcome
  • outcome known to be direct & immediate results of intervention; may require specialized training
  • clinically meaningful to participants; special tests/training not required
Participant compliance with intervention
  • closely monitored
Practitioner compliance with study protocol
  • closely monitored
Analysis of primary outcome
  • intention-to-treat analysis usually used; but usually supplemented with “compliant participants” analysis to answer question of “does this intervention work in the ideal situation?”; analysis focused on narrow mechanistic questions
  • intention-to-treat analysis (includes all patients regardless of compliance)
  • meant to answer the question “does the intervention work in “real world” conditions, “with all the noise inherent therein” (Thorpe et al, 2009)

I also came across this article in Forbes magazine: Why We Need Pragmatic Science, and Why the Alternatives are Dead-Ends. It’s a short read, but it succinctly summarizes an argument I find myself often making: science is a powerful tool for understanding and explaining the world. It’s not the only tool (philosophy and the other humanities, for example, are great tools for different purposes), but it’s certainly the best one for certain purposes and it’s a fantastic one to have in our toolbox!

References:

Berwick, D.M. (2005). Broadening the view of evidence-based medicine. Quality & Safety in Health Care. 14:315-316. (full-text)

Thorpe, K.E., Zwarenstein, M., Oxman, A.D., Treweek, D., Furberg, C.D., Altman, D.G., Thus, S., Bergel, E., Harvey, I Magid, M.J., & Chalkidou, K. (2009). A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers. Canadian Medical Association Journal. 180(10): E47-E57.

Posted in Uncategorized | Leave a comment

Process Use of Evaluation

Just noticed this in my drafts folder – some notes on process use evaluation from some of the papers I’d been reading on the topic. Figured I should actually publish it.

Definition of process use:

  • “the utility to stakeholders of being involved in the planning and implementation of an evaluation” (Forss et al, 2002, p. 30)
  • Patton describes “process use” as “changes resulting from engagement in the evaluation process and learning to think evaluatively. Process use occurs when those involved in the evaluation learn from the evaluation process itself or make program changes based on the evaluation process rather than findings. Process use also includes the effects of evaluation procedures and operation, for example, the premise that “what gets measured gets done”, so establishing measurements and setting targets affects program operations and management focus.” (Patton, 2008, p. 122) or “individual changes in thinking, attitudes, and behavior, and program or organizational changes in procedures and culture that occur among those involved in evaluation as a result of learning that occurs during the evaluation process.” (Patton, 2008, p. 155)
  • 6 types of process use (pp. 158-9):
    • infusing evaluative thinking into organizational culture
    • enhancing shared understanding
    • supporting and reinforcing program intervention – “the primary principle of intervention-oriented evaluation is to build a program delivery model that logically and meaningfully interjects data collection in ways that enhance achievement of program outcomes, while also meeting evaluation information needs” – while traditional research would view measurement that affects the outcome as contamination, if evaluation is part of the intervention, for the purposes of the evaluation of the program “it does not matter […] how much of the measured changed is due to [the data collection] vs actual [program] activities, or both, as long as the instrument items are valid indicators of desired outcomes” (Patton, 2008, p. 166). “A program is an intervention in the sense that it is aimed at changing something. The evaluation becomes part of the programmatic intervention to the extent that the way it is conducted supports and reinforces accomplishing desired program goals” (Patton, 2008, p. 166)
    • instrumentation effects and reactivity
    • increasing engagement, self-determination, and ownership
    • program and organizational development
  • In the very interesting article “Process Use as a Usefulism”, Patton (2007) describes how he thinks of process use as a “sensitizing concept”
  • sensitizing concept (Patton, 2007, p. 102-103):
    • “can provide some initial direction to a study as one inquires into how the concept is given meaning in a particular place or set of circumstances”
    • “Such an approach recognizes that although the specific manifestations of social phenomena vary by time, space, and circumstance, the sensitizing concept is a container for capturing, holding, and examining these manifestations to better understand patterns and implications”
    • raises consciousness about something and alerts us to watch out for it within a specific context. This is what the concept of process use does. It says things are happening to people and changes are taking place in programs and organizations as evaluation takes place, especially when stakeholders are involved in the process. Watch out for those things. Pay attention. Something important may be happening.”

Types of Use of Evaluation

  • symbolic use (a.k.a., strategic use or persuasive use):
    • “evaluation use to convince others of a political position” (Peck & Gorzalski, 2009, p. 141 )
    • “use of knowledge as ammunition in the attainment of power or profit”(Straus et al, 2010)
  • conceptual use:
    • “to change levels of knowledge, understanding, and attitude” (Peck & Gorzalski, 2009, p. 141)
    • process use: “knowledge gained through the course of conducting  program evaluation” (Peck & Gorzalski, 2009, p. 141)
  • instrumental use:
    • “direct use of evaluation’s findings in decision making or problem solving” (Peck & Gorzalski, 2009, p. 141)
    • “to change behaviour or practice” (Straus et al, 2010)
  • Forss et al (2002) cite Verdung (1997)  as identifying 7 ways that evalautions can be used: “instrumentally, conceptually, legitimizing, interactively, tactically, ritually, and as a process” (p. 31)
  • Forss et al identify 5 different types of process use:
    • learning to learn
      • “Patton (1998) wrote that the evaluation field has its own particular culture, building on norms and values that evaluators take for granted,but which may be quite alien to people embedded in the culture of another profession. Patton (1998: 226) suggests that these values include ‘clarity, specificity and focusing, being systematic and making assumptions explicit, operationalising programme concepts, ideas and goals, separating statement of fact from interpretations and judgments’.” (Forss et al, 2002, p. 33, emphasis mine)
        • I checked out the original source on this – the direct quotation is: “that evaluation constitutes a culture, of sorts. We, as evaluators, have our own values, our own ways of thinking, our own language, our own hierarchy, and our own reward system. When we engage other people in the evaluation process, we are providing them with a cross-cultural experience. They often experience evaluators as imperialistic, that is, as imposing the evaluation culture on top of their own values and culture—or they may find the cross cultural experience stimulating and friendly. In either case, and all the spaces in between, it is a cross-cultural interaction […] This culture of evaluation, which we as evaluators take for granted in our own way of thinking, is quite alien to many of the people with whom we work at program levels. Examples of the values of evaluation include: clarity, specificity and focusing; being systematic and making assumptions explicit; operationalizing program concepts, ideas and goals; distinguishing inputs and processes from outcomes; valuing empirical evidence; and separating statements of fact from interpretations and judgements. These values constitute ways of thinking that are not natural to people and that are quite alien to many” (Patton, 1998, pp. 225-6, emphasis mine)
      • values of evaluation include “enquiry”, “a structured way of thinking about reality and generating knowledge” (Forss et al, 2002, p. 33)
      • “to engage in evaluation is thus also a way of learning how to learn” (Forss et al, 2002, p. 33)
    • developing networks – evaluation activities can bring together people who don’t usually work together
    • creating shared understanding
      • working together “help[s] people understand each other’s motives, and to some extend also to respect the differences” (Forss et al, 2002, p. 35)
      • note that “the usefulness of evaluation hinges directly upon the quality of the communication in evaluation exercises”  (Forss et al, 2002, p. 35)
    • strengthening the project
      • when the evaluator works to understand the program, it helps stakeholders themselves to get a “clearer understanding of the project and possibly with a new resolve to achieve the project’s aims” (Forss et al, 2002, p. 36)
      • “Patton (1998) calls this ‘evaluation as an intervention’; the evaluation becomes an intentional intervention supporting programme outcomes.” (Forss et al, 2002, p. 36)
      • “The way the team formulates questions, discusses activities and listens to experiences, may influence activities at the project level.” (Forss et al, 2002, p. 36)
    • boosting morale
      • “reminds them of the purposes they work for, and allows them to explore the relationship between their own organization and the […] impact that is expected” (Forss et al, 2002, p. 37)
      • “the fact that attention is shown, the project is investigated, viewpoints are listened to and data are collected could presumably given rise to similar positive effects as […] Hawthorne”  (Forss et al, 2002, p. 38) [though I would note that in some organizations, evaluations are only conducted when a program is seen to be failing/in trouble and the evaluator is sent it to figure out why or to decide if the program should be closed – this could de-motivate people. Also, my experience has been that if data is collected by people from whom the data was collected don’t see what is done with it, they don’t feel listened to and feel like they’ve been asked to do work (of data collection) for no reason – and that’s demotivating. So it’s really about the organization’s approach to evaluation and how they communicate]

 

  • because process use means that the evaluation is having an effect on the stakeholders, “an evaluation may become part of the treatment, rather than just being an independent assessment of effects” (Forss et al, 2002, p. 30)
  • “an evaluation is not neutral, it will reinforce and strengthen some aspects of the organization, presumably at an opportunity cost of time and money” (Forss et al, 2002, pp. 38-9)
  • “the report itself will normally provide little new insight. Most discoveries and new knowledge have been consumed and used during the evaluation process. The report merely marks the end of the evaluation process.”(Forss et al, 2002, p. 40)
  • The “merit” of evaluation “lies […] in discovering unknown meanings, which help stakeholders to develop a new self-awareness, and in implementing new connections between people, actions, and thoughts” (Bezzi, 2006, cited in Fletcher & Dyson, 2013)
  • Fletcher & Dyson (2013) describing an evaluation that one of them had done: “The first evaluation challenge facing the first author was in helping the project’s diverse range of partners to develop a shared understanding of what the project would be. As is so often the case in project development, there had been a primary focus on securing funding and not on the real-life details of the project itself. The project logic, its conceptualization of culture change processes and, most importantly, the why and how of this logic and concept, had not been articulated – despite the fact that articulation of such project logic and culture change conceptual framework would, in turn, affect the overall defined aim and anticipated outcomes. As argued by Weiss (1995), when interventions do not make such things clear (either to themselves, or to others), the evaluation task becomes considerably more challenging. Given the already discussed nature of the collaborative research approach, it was fitting for the evaluator to assist in such articulation in order to ensure that the evaluation plan was both coherent with and relevant to such logic and conceptualization.” (p. 425)

References

  • Fletcher, G., Dyson S. (2013). Evaluation as a work in progress: stories of shared learning and development. Evaluation. 19(4): 419-30.
  • Forss, K, Rebien, C. C., Carlsson, J. (2002). Process use of evaluations: Types of use that precede lessons learned and feedback. Evaluation. 8(1):29-45.
  • Patton, M.Q. (1998). Discovering process use. Evaluation. 4(2):225-233.
  • Patton, M.Q. (2008). Utilization-focused evaluation, 4th edition. Thousand Oaks, CA: Sage.
  • Peck, L. R., Gorzalski, L. M. (2009). An evaluation use framework and empirical assessment. Journal of Multidisciplinary Evaluation. 6(12): 139-156.
  •  Straus, S. E., Tetroe, J., Graham, I. D., Zwarenstein, M., Bhattacharyya, O., Leung, E. (2010). Section 3.6.1: Monitoring Knowledge Use and Evaluating Outcomes of Knowledge Use in Knowledge translation and commercialization. Retrieved from http://www.cihr-irsc.gc.ca/e/41945.html
Posted in evaluation | Tagged , , | Leave a comment

Australasian Evaluation Society (AES) Conference Recap

In September, I had the fantastic opportunity to attend the Australasian Evaluation Society conference in Perth, Western Australia. As I did with the Canadian Evaluation Society conference, I’m going to summarize some of my insights, in addition to cataloguing all the sessions that I went to. So rather than present my notes by session that I went to, I’m going to present them by topic area, and then present the new tools I learned about. Where possible. I’ve included the names of people who said the brilliant things that I took note of, because I think it is important to give credit where credit is due, but I apologize in advance if my paraphrasing of what people said is not as elegant as when the people actually said them. I’ve also made notes of my own thoughts, as I was going through my notes to make this summary, which I’ve included in [square brackets]

Evaluation

  • Traditionally, evaluation has been defined as being about judging merit or worth; a more contemporary view of evaluation includes it being about the production of knowledge, based on systematic enquiry, to assist decision making. (Owen) [This was interesting to me, as we have been working on elucidating the differences/overlaps among evaluation, research, monitoring, quality improvement, etc. Owen’s take on evaluation further blurs the line between evaluation and research, as research is often defined as producing new knowledge for knowledge’s sake.]
  • Evaluation is “the handmaiden of programs” (Owen) – what really matters is the effective delivery of programs/policies/strategies. Evaluation being involved on the front-end has the most potential to help that happen.
  • I really like this slide from John Gargani, the American Evaluation Association president:

Evaluation. John Gargani.

Theory vs. Practice

  • Practice is about actually doing something vs. theory, which is about having “coherent general propositions used as principles of explanation for a class of phenomena or a particular concept of something to be done or of the method of doing it; a system of rules or principles” (Owen).
  • Praxis: “the act of engaging, applying, and reflecting upon ideas, between the theoretical and the practical; the synthesis of theory and practice without presuming the primacy of either” (Owen).

Evaluative Thinking (ET)

  • ET is a form of higher order thinking: making judgments based on evidence, asking good questions, suspending judgment in the absence of sufficient evidence, etc.
  • “If I know why I believe X, I’m relatively free to change my belief, but if all I know is “X is true”, then I can’t easily change my mind even in the face of discomfirming evidence” (Duncan Rintoul).

Evaluation-Receptive Culture

  • Newcomer, citing Mayne (2010), talked about the features of an “evaluation-receptive culture”:
    • fight the compliance mentality [looking only to see if people are complying with a state program/procedure pre-supposes that the it is the “right” program/procedure – evaluation does not make such presuppositions]
    • reward learning from monitoring and evaluation
    • cultivate the capacity to support both the demand for, and supply of, information
    • match evaluation approaches/questions with methods

Evaluation and Program Planning

  • evaluative thinking and evaluation findings can be used to inform program planning (note that this isn’t always what happens. Often program planning is not as rational of a process as we’d hope!)
  • “proactive evaluation” (according to Owens et al) = we need to know:
    • what works: what interventions –> desired outcomes
    • practice: how to implement a program
    • who to involve
    • about the setting: how contextual factors affect implementation
  • innovation factors affecting program design:
    • implementation is the key process variable, not adoption [where they used “adoption” to mean “choosing the program”. My experience is that this is not how the word “adoption” is always used 1E.g., while Owen used “adoption” to refer to the “adoption” (or choosing) of a program to implement, I’ve seen others use “adoption” to refer to individuals (e.g., to what extend individuals “adopt” (or enact) the part of their program they are intended to enact).
    • the more complex the intervention, the more attention needs to be given to implementation
    • we need to analyze key innovation elements, with some elements needing more attention than others
    • the most difficult element to implement are changed user roles/role relationships
  • change is a process, not a single event
  • when implementing a program at multiple sites, there will be variation in how it is implemented
  • there must be effective support mechanisms and leadership buy-in is essential
  • evaluation tends to be more context sensitive than research [I’d qualify this with “depending on the type of research”]
  • why do people not include context sensitivity in complex intervention design?
    • command and control culture (with a lack of trust in the front lines)
    • structural limitations of processing and responding to large amounts of data with nuanced implications
    • epistemologies, especially in the health sector (where people tend to think that you can find an intervention (e.g., drug, surgery) that works and then push out that intervention, despite the evidence that you can’t just push out an intervention and expect it will be used)
  • profound differences between designers and intended users – evaluators can “translate” users voices to designers

Evidence-Based Policy

  • the theory of change of evidence-based policy :

Theory_of_Change_of_evidence-based_policy

  • “evidence-based” policy can refer to any of these levels:

Evidence-based_policy_-_levels

  • some challenges for evidence-based policy:
    • what constitutes “evidence”?
    • is evidence transferrable? (e.g., if it “works” in a given place and time, does that necessarily mean it will work in another place or at another time?)
  • people often overstate the certainty of the evidence they collect – e.g., even if a study conclusion is that a program played a causal role in the place/time they was conducted, will it play a wide enough casual role that we can predict it will play a causal role in another time/place (which is what “evidence-based” policy is doing when it takes conclusions from a study/studies as evidence that the program should be applied elsewhere)?

Rubrics

  • problem: to make evaluative conclusions, you need standards to make those conclusions
  • most evaluation reports do not provide specifics about how the evaluation findings are synthesized or the standards by which the conclusions are drawn (often they do this implicitly, but it’s not made explicit
  • this lack of transparency about how evaluation conclusions are drawn makes people think that evaluation is merely subjective
  • rubric comes from “red earth” (used to mark sheep to track ownership and breeding)
  • the nature of evaluation (Scriven):

nature_of_evaluation_-_Scriven

  • the logic of evaluation, summarized in 4 steps:
    1. establish criteria
    2. construct standards
    3. measure performance
    4. compare performance to standards and draw conclusions

rubric

  • you compare your performance data to the descriptors to determine whether the standard was achieved, which allows you to draw an evaluative conclusion [I am familiar with rubrics from my work as an instructor, where I often provide grading rubrics to my students so that they know what level of work I am expecting in order to get an A, B, C, D, or F on an assignment. I haven’t yet used a rubric in a program evaluation]
  • by determining the standards before you collect performance data, you are thinking about what does a “good” or “successful” program look like up front; if you only start to think about what is good enough to be considered success after you see the data, you can be swayed by the data (e.g., “Well, this looks good enough”)
  • use the literature and stakeholders to build your rubrics
  • Martens conducted a literature review and interviews and found that few evaluators write about using rubrics in their work (not clear if it’s because people aren’t using them or just aren’t writing about them) and that most people who use them learned through contact with Jane Davidson or her work
  • it was noted that because of the transparency of rubrics, people don’t argue about whether measures are “good enough” (like they did before that person used rubrics)
  • rubrics do need to be flexible to changing evaluand – it was also noted that sometimes evidence emerges during an evaluation that you hadn’t planned for in your rubric – and it’s OK to add, for example, a new criteria; but you can’t change the rubric after the fact to hide something on which the program did poorly
  • future research is needed on best practices for using rubrics and to investigate the perspectives of funders and evaluation users on rubrics

Implementation Science

  • implementation = “a specific set of activities designed to put into place an activity or program of known dimensions” (Fixsen eta l, 2005; cited by Wade)
  • this table provides a nice distinction between what we typically evaluate in program evaluation (i.e., an intervention) vs. implementation (i.e., how it gets implemented) and what happens if each of those are effective or not effective
intervention – “what” gets implementation
effective not effective
implementation – “how” it gets implemented effective actual benefits poor outcomes
not effective
  • inconsistent
  • not sustained
  • poor outcomes
  • poor outcomes
  • sometimes harmful

Untitled

  • the more complex an intervention is, the more attention needs to be paid to implementation
  • the most difficult part of implementation is the changes in roles and relationships (i.e., behavioural changes)
    • change is a process, not an event
    • people don’t like to be told to change – you need to situate new behaviours in relevant history and saliency
    • understand different actors’ motivations for behaviour change
  • when you have multi-site projects/programs, you will have variation in implementation (i.e. how an intervention actually get implemented at different sites  (even though you are implementing the same intervention at the different sites)
  • why don’t people include context-sensitivity in complex intervention design?
    • command and control culture (a lack of trust in the front lines)
    • structural limitations on processing and responding to large amounts of data with nuanced implications
    • epistemologies, especially in the health sector (in the health sector, people often think that they just find a pill/needle/surgery that is proven to work in an RCT and you just need to make that available and people will use it, despite evidence that just pushing out interventions does not actually get people to use them)
  • there are profound differences between program designers and users/implementers of the program – evaluators can be a “translator” between the two.
  • evaluators can ask the challenging questions
  • our worldview is often different from program implementers and we can bring our insights
  • continuous quality improvement (CQI): “the continuous use of data to inform decision making about intervention adaptations and about additional implementation supports (e.g., training, coaching, change to administrative processes, policies, or systems) needed to improve both implementation of the intervention and the outcomes associated with the intervention” (Wade)
    • ignores the idea of “if it ain’t broke, don’t fix it”
    • uses ongoing experimentation to improves processes (not about improving/”fixing” people) – e.g., Plan-Do-Study-Act (PDSA) cycles
    • small incremental changes
    • most effective when it’s a routine part of the way people work
  • CQI evaluation questions:
    • intention to reach:
      • did the program reach its intended population?
      • how well is it reaching those who would most benefit from it? (e.g., high risk groups, location/age/gender/SES)
    • implementation:
      • to what extent is the program delivered as intended? [this assumes that the way the program is designed is actually appropriate for the setting; sometimes programs are shown to be effective in one context but aren’t actually effective in a different context. Similarly, how to implement may work well in one context but not in another context]
    • impact on outcomes:
      • what changes in status, attitudes, skills, behaviours, etc. resulted from the intervention?

Evaluation in Aboriginal Communities

  • There are many similarities between Australia and Canada with respect to Aboriginal people: a history of colonization, systematic discrimination, and ongoing oppression; A history of an imposition of “solutions” on community via programs, service delivery models, and evaluation methods; these are imposed by privileged white voices and they often harm Aboriginal communities and people rather than helping.
  • Aboriginal communities prefer:
    • self- determination
    • two-way learning
    • participating
    • capacity-building – evaluations should not be about someone coming in and taking from the community
    • include an Aboriginal worldview
    • develop a shared understanding of what partnership really means
  • Evaluation should be ongoing
  • Evaluators should be facilitators, should be respectful, should understand community capacity within the context of local values/norms
  • Trauma-informed, as communities have experienced colonial violence
  • Often evaluations do not allow the time needed to do the work that is needed to conduct evaluation in a respectful way that gets useful information
  • Communities need to have confidence in evaluations = confidence that evaluators will hear the whole story and report it ethically, and evaluations will be useful to the community and be done respectfully with the community

Systems Thinking

  • “You don’t have to be a systems theorist to be a systems thinker” (Noga). You can use systems thinking as a paradigm/worldview, without having to go into the whole theory. [This reminded me of Jonathan Morrell’s talk at the CES conference]
  • System = elements + links between the elements + boundary.
    • Without the links between the elements, it’s just a collection.
    • Boundaries can be physical, political, financial, etc. They may also be contested (not everyone may agree as to what the boundaries of a given system are). Determining the boundaries = deciding what’s in, what’s out, and what’s considered important; it’s an ethical, moral, and political decision.
  • A program always has a mental model (someone believes there is problem and the program is a way to solve it), even if they haven’t articulated it.
    • Evaluators investigate/describe:
      • the program
      • assumptions
      • mental models
      • stakeholders and their stakes (see Ladder Diagram in the Tips & Tools section below)
    • As an evaluator, look for leverage points the program is using. Are they working? Could their be better ones?
  • Interrelationships are what make a system a system instead of just a collection; they create:
    • outcomes
    • but also barriers, detours
      • function & dysfunction
      • emergent patterns & trends
  • Complex systems are unpredictable (the program can have hoped-for or intended outcomes, but can’t necessarily predict things with any certainty).

IMG_3579

  • The Systems Iceberg: Mental Models (what is the problem and how do we think we can solve it?), whether explicit or implicit, cause us to design structures, that in turn influence patterns of systems behaviour, which lead to events, which are what we tend to observe.
    • e.g., you get a cold (event), because you haven’t been eating or sleeping well (behaviour), due to poor access to nutritious food and work stress (structures), work stress affects you because your career is important to your identity (so a threat to your career threatens your identity) and you believe resting = laziness.
    • When you are evaluating a system, start at the top: what events happened? what patterns of behaviour led to those events? what structures lead to those patterns? what mental models/assumptions lead to those structure being developed in the first place.
    • If you are designing a program, start at the bottom and work up! (Make your mental models explicit so you can make your design more intentional).
    • Can use the iceberg conceptually as you investigate the program – e.g., build it into interview questions (ask about what happened, then ask questions to find patterns, questions to uncover mental models)
      • interviews are a good way to get to mental models
      • artifacts and observations are good ways to get to system structures
      • observations and interviews are good way to get to patterns of behaviour and events.
  • Complex Adaptive Systems: “self-organization of complex entities, across scales of space, time, and organizational complexity. Process is dynamical and continuously unfolding. System is poised for change and adaptation” (Noga slide deck, 2016)

CAS

  • Think of the above diagram as more of a spiral than a circle.
    • e.g., 1. more women are single mothers and have to work, and more women choose to enter workface –> 2. policies re: childcare, tax credits for daycare/employers create daycares in response to more women in the workforce –> 3. supported childcare –> 1. new agent actions (e.g., even more women join the workforce as the new policies make it easier to do s0) and so on
  • With CASs, expect surprises – so you need to plan for them (e.g., do you have a process in place to renegotiate what data is being collected in response to changes?)
  • Wicked Problem truly resist a solution and every intervention into the wicked problem has an effect (so you can’t just pilot an intervention like you would for a normal problem, as the pilot itself will change the situation so doing that intervention again may not have the same effect as the starting point would be different; plus, the effect of the next intervention will be affected by the effect of the prior intervention; examples include: poverty, obesity epidemic, climate change, education (e.g. what do children need to know in the 21st century and how do we teach it to them?) – wicked problems interact with each other and that makes things even more complex (e.g., effects of climate change on those in poverty).
  • Take home messages from Jan on Systems Thinking:
    • be intentional – use systems thinking when it makes sense, use tools when they make sense
    • document boundaries and perspectives
    • our job as evaluators is to surface the story that the system is telling

Complexity

  • some common approaches to complexity that don’t work
    • careful planning doesn’t work in situations of uncertainty (because how can you plan for the unknown?)
    • reliance on monitoring & evaluation (M&E) plans with high level annual reviews to guide implementation and oversight by higher ups who don’t have understanding of, or influence at, the front lines)
    • emphasis on celebrating successes rather than learning from failures
    • use of short timeframes and rigid budgets to reduce risk (it actually increases risk of ineffective interventions)
  • instead we need:
    • more regular reviews/active monitoring (which requires lots of resources; and we need to make sure it doesn’t become excessively structured)
    • determine where the bottlenecks for adoption are and delegate responsibility to that level, giving lots of local autonomy, coaching, foster self-organization (need decision making at the lower levels)
    • learn from good pilots and then use that to inform expansion (but also need to study how that expansion goes – can’t assume it will go the same way as the pilot)
    • payment by results gives responsibility to the implementing agencies/communities to what they want to do, but:
      • the outcomes need to be the correct ones
      • the outcomes need to be verifiable [because this can easily be gamed, where people work to change the measured outcomes, not necessarily the underlying thing you are trying to change]
    • modeling likely scenarios during design and at critical junctures using:
      • human-centred design
      • agent-based modeling
      • complex system modeling
    • all approaches need insight from evaluation
  • often when higher ups look at indicators, things seem simple (indicators alone do not reveal the complexity that occurs on the ground)

Innovation

  • In the session by Plant, Cooper, & Warth, they discussed innovation in healthcare in BC and New South Wales. In the BC context, “innovation” is usually focused on something that “creates value”, whereas in NSW it’s more about “something new” (even if it’s just new to you)
  • a lively group discussion brought up some interesting points:
    • innovation happens on the ground, so a top down approach to “mandate” innovation doesn’t really work
    • innovation is a process, so the evaluation of innovation should be an evaluation of the process (rather than the product of the innovation) [though wouldn’t this depend on the evaluation question? e.g., if the evaluation question is “was the outcome of this program to support innovation worth the investment?”]
    • innovation is challenging in a risk-averse setting like healthcare, as innovation requires risk taking as you don’t know if it’s going to work
    • evaluation can have a role in supporting innovation when:
      • proximity – there is a clear line of sight between activities and outcomes
      • purpose – when a learning purpose is valued by the client
      •  evaluation is embedded in the planning cycle (using evaluative thinking/an evaluative mindset to inform planning)
    • evaluator skills needed for evaluation to drive/support innovation:
      • political nous (a.k.a. political savvy) – situational/reflexive practice competencies
      • context knowledge – i.e., knows the health system
      • content knowledge – i.e., specific to the area of innovation
    • factors that limit evaluation’s role:
      • political/leadership barriers & decision cycles
      • innovation overload
      • time frames
      • a “KPI mindset” – i.e., inappropriate outcome measurement; the use of evaluation resources for measurement rather than doing deep dives and garnering nuanced understanding
        • how do we counter the “KPI mindset”? The evaluation approach is different – e.g., you start with a question and then ask what data will provide the evidence required to answer that question (rather than starting with indicators – and assuming you know the right indicators to monitor)? (And that data might be qualitative!)

Cognitive Bias

  • cognitive bias = “habits of thought that often lead to erroneous findings and incorrect conclusions” (McKenzie)
    • e.g., framing effect: how you frame data affects how people react to it. E.g., if people are told a procedure has a 90% survival rate they are more likely to agree to it than if you say it has a 10% mortality rate. Thus, even though these mean the same thing, the way it’s framed affects the decision people make based on the evidence.
    • e.g., anchoring effect: naming a number can affect what people expect, E.g., if you ask one group “Did Ghandi die before or after the age of 5?” and a second group “”Did Ghandi die before or after the age of 140?”, and then you ask people to guess what age he actually died, the second group will guess higher than the first group. This happens even though by 5 and 140 are obviously wrong – but they “anchor” a person’s thoughts to be closer to that first number they heard.
    • there are tonnes more cognitive biases [Wikipedia has a giant list!]
  • even when we are doing a conscious reasoning process, we are still drawing on our subconscious, including biases
  • we like to believe that we make good and rational decisions, so it can be hard to accept that our thoughts are biased like this (and hard to see past our biases, even when we are aware of them)
  • there is not much research on cognitive bias in evaluators, but research does show that evaluators :
    • vary in the decisions they make
    • vary in the processes they use to make decisions
    • tend to choose familiar methods
    • are influenced by their attitudes and beliefs
    • change their decision making with experience (becoming more flexible)
    • write reports without showing their evaluative reasoning
  • some strategies to address bias:
    • conduct a “pre-mortem” –  during planning, think of all the ways that the evaluation could go wrong (helps to reduce planning bias)
    • take the outside view (try to be less narrowly focused from only your own perspective)
    • consult widely (look for disconfirming evidence, because we all have confirmation bias – i.e., paying more attention to those things that support what we already believe than those things that go against it)
    • mentoring (it’s hard to see our own biases, even for people who are experts in bias!, but we can more easily see other people’s biases)
    • make decisions explicit (i.e., explain how you decided something – e.g., how did you decide what’s in scope or out of scope? how did you decide what’s considered good vs. bad performance? This can help surface bias)
  • reflecting on our own practice (e.g., deep introspection, cultural awareness, political consciousness, thinking about how we think, inquiring into our  own thought patterns) needs to happen at points of decision and needs to be a regular part of our practice
  • 10 minutes of mindfulness per day can decrease bias (compare that with the hours per day of mindfulness for many weeks that are required to get the brain changes needed for health benefits)
  • some audience suggestions/comments:
    • have other evaluators look at our work to look for bias (it’s easier to see other people’s bias than our own)
    • we are talking about bias as if there is “one truth”, but there are multiple realities, so sometimes we misuse the word bias

Design Thinking

Untitled

  • model from the Stanford Design School:
empathize→ define→ ideate→ prototype→ test→ learn
understand the experience of the user
define the problem from the user’s perspective
explore lots of ideas (divergent thinking) and then narrow them
reduce options to best options → experience them
  • test best ideas
  • observe & feedback to refine
can scale your learnings (e.g., to other projects, other users, other geographies)
 [the speaker added this one to the model]
  • model is shown as linear, but it is really iterative

Misc.

  • In a session on evaluation standards, there was some good discussion on balancing the benefits of professionalizing evaluation (e.g., helps provide some level of confidence in evaluation if we have standards to adhere to and helps prevent someone who really doesn’t know what they are doing from going around claiming to do evaluation and making a bad name for the field when they do poor work) with the disadvantages (e.g., it can be elitist by keeping out people who have valuable things to contribute to evaluation but don’t have the “right” education or experience; can stifle innovation; can lead to evaluators working to meet the needs of peer reviewers rather than the needs of the client). There was also discussion about how commissioners of evaluation can lead to some issues with the quality of an evaluation by their determinations of scope, schedule, and/or budget).
  • John Owen gave an interesting “history of evaluation” in his keynote talk on Day 2. An abridged version:
    • pre-1950 – evaluation as we know it didn’t exist
    • pre-1960: Tyler, Lewin, Lazarfield in USA (If you had objectives for your program and measured them, then you could say if a program “worked” or not)
    • 1960s: with the “Great Society” in the US, there was more government intervention to meet the needs of the people and the government wanted to know if their interventions worked/was their money being spent wisely (accountability purpose of evaluation).
    • 1967 – academics had become interested in evaluation. Theory of Evaluation as being a judgement of merit/worth. Michael Scriven (an Australian) contributed the notion of “valuing”, which isn’t necessarily part of other social sciences.
    • 1980s onward – an expansion of the evaluation landscape (e.g., to influence programs being developed/underway; to inform decision making)
    • currently – a big focus on the professionalization
  • Katheryn Newcomer also presented a brief summary of evaluation history:
    • 1960s: “effectiveness”
    • 1980s: outcomes
    • 1990s: results-based
    • 2000s: evidence-based
  • Words:
    • Newcomer notes that Scriven advocates the use of the term “program impactees” rather than “beneficiaries” because we don’t know if the program recipients will actually receive benefits [though to me “impactees” suggests anyone who might be affected by the program, not just the program users (which is usually what people are talking about when they say “beneficiaries”). But I can totally appreciate the point that saying “program beneficiaries” is biased in that it pre-supposes the program users get a benefit. I usually just say “program users”/]
    • “Pracademic”- a practical (or practitioner) academic (Newcomer)
  • In discussing evaluating a complex initiative, Gilbert noted that they choose to focus their evaluation only on some areas and no one has criticized them for prioritizing some parts over other parts [I found this an interesting comment as I’m concerned on my project that if some parts aren’t evaluated, it would be easy to criticize that the project on the whole was not evaluated]. She also noted that they had really rich findings and that there was a triangulation where findings on one focus area complimented findings on another focus area

Tips and Tools

Throughout the conference, a number of speakers shared different tips and tools that might be useful in evaluation.

Ladder diagram for mapping stakeholders and stakes:

  1. list all the stakeholders
  2. ask them each “what is the purpose of this program?” (those are the “stakes”)
  3. draw lines between the stakeholders and the stakes
  • allows you to see:
    • stakes that are held by multiple groups
    • stakes that only have one stakeholder (sometimes these outliers are really important! e.g., for an after-school program in which Noga did this where they were experiencing poor attendance/high drop out rates, the kids were the only stakeholders that noted “fun” as a purpose of the program. That was the missing ingredient to why kids weren’t showing up – the program planners and deliverers were focused on things like safety and nutrition, but hadn’t thought about making it fun!)

ladder_diagram
Program Models (e.g., logic models)

  • A model is a representation of the program:
    • at a certain time
    • from a certain perspective
  • Can look at the model over time, reviewing what has changed or not changed (and what insights does that give us about the program?)

Causal Loop Diagrams

  • A diagram that shows connections and interrelationships among elements
  • Difficult to make and to use (would probably want a systems dynamics expert to assist with this if you were to make/use one)
  • Here’s an example of one (from Wikipedia):
    Causal Loop Diagram of a Model

“Low Tech Social Networking”

  • an ice breaker activity that you can use to see the mental models people are working with and to start to see connections in the network
  • ask participants to do the following on a sheet of paper:

low_tech_social_networking

Exploring Boundaries Activities

  • a bunch of toy animals were provided and every participant was told to pick any 4 they wantLittle toys used for an activity in the pre-conference workshop I went to
  • in table groups, participants were asked to find how many different ways they can be groups
    • e.g., some of the groups we came up with were grouping by biological taxonomy (e.g., amphibians, reptiles, mammals, birds), by land-based/water-based animals, by number of legs, by colour, in order of size
  • this allows you to illustrate how boundaries are constructed by our choices (within certain constraints) – how and why people chose the boundaries they do are interesting questions to think about

Postcard Activity

  • Participants are all asked to pick 3 postcards from a pile
  • Groups asked to make up a story using their cards. Each group tells their story to the whole crowd.
  • Debrief with the groups:
    • You are constrained by the cards each person brought in (and a perspective was brought by each person choosing the cards)
    • You find links
    • You make choices
      • Did you fit the cards to a narrative you wanted?
      • Did the story emerge from the cards?
      • There is no one right way to do it
      • A different group could come up with a totally different story from the same cards (different perspectives)
    • When you are evaluating, the program is the story. You want to understand how the story came to be. What was the process? What perspectives are reflected?
  • Bonus idea: You could use this postcard activity as a data collection tool – ask people to write the anticipated story before you start, then again midway through, then at the end. Which of the things you expected held? Which didn’t? Why did things change? What was surprising?

Snowcarding

  • ask a question (e.g., what are we going to do about juvenile delinquency in this town?)
  • everyone writes as many ideas as they can think of one sticky notes (one idea per sticky note) and cover the wall with them
  • group then themes the ideas together
  • then ask the group “What are you going to do with these ideas?”

Game to Demo the Concept of Self-Organization

  • each person is assigned a “reference person” and they are told they are not allowed to be within 3 ft of that person
  • everyone is told to go mingle in the room
  • some people are assigned the same “reference person” – they will end up clumping together as they all try to avoid that person – this is an example of an emerging, self-organized pattern (a bunch of individual agents acting on their own reasons end up forming patterns)

Creating Personas

  • a tool commonly used in marketing where you craft a character to represent market segments (or, in the case of evaluation, stakeholder)
  • can use this to help with your stakeholder mapping and evaluation planning
  • e.g., create a persona of Max the Manager, Freda the front-line staff, Clarence the Client, etc. – what are their needs/wants/constraints/etc.? how can these characters help inform your planning?
  • avoid stereotyping (base on real data/experience as much as possible) and avoid creating “elastic” personas (i.e., contorting the character to incorporate everything you’d want in someone in that role)
  • design for the majority, but don’t forget the outliers

Participant Journey Mapping

  • a visual representation of an individual’s perspectives of their interactions and relationships with a organization/service/product
  • can use this to map out the “activities” for a logic model
  • focus on the user’s experience (what the user experiences can be quite different from what the program designer/administrator thinks the experience is)
  • think about:
    • emotional side – high points/low points; happiness/frustration; pain points
    • time/phases – before/during/after the experience
    • touch points and formats – e.g., online/offline; phone/F2F; real person/robot
  • it’s about understanding the experience
  • useful way to identify areas for improvement
  • can be used during design or during implementation
  • can be a communication tool
  • can be an evaluation tool – e.g., map a user’s journey to see what was good/bad from their perspective and identify places to review more deeply/improve

Compendium Software

Things to Read:

  • John Owen’s Book: Program Evaluation: Forms and Approaches is considered a seminal evaluation text in Australia. I haven’t read it yet, so I should really check it out!
  • Peunte & Bender (2015) – Mindful Evaluation: Cultivating our ability to be reflexive and self-aware. Journal of Multidisciplinary Evaluation. Full-text available online.
  • Moneyball for Government – a book, written by a bipartisan group, that “encourages government to use data, evidence and evaluation to drive policy and funding decisions”
  • Mayne (2010). Building an evaluative culture: the key to effective evaluation and results management. The Canadian Journal of Program Evaluation. 24(2):1–30. Full-text available online.
  • David Snowden’s cynafin. http://cognitive-edge.com/ [I’ve read this before, but think I should re-read it

Sessions I Presented:

  • Snow, M.E, Snow, N.L. (2016). Interactive logic models: Using design and technology to explore the effects of dynamic situations on program logic (presentation).
  • Snow, M.E, Cheng, J., Somlai-Maharjan, M. (2016). Navigating diverse and changing landscapes: Planning an evaluation of a clinical transformation and health information system implementation (poster).

Sessions I Attended:

  • Workshop: Connecting Systems Thinking to Evaluation Practice by Jan Noga. Sept 18, 2016.
  • Opening Keynote: Victoria Hovane – “Learning to make room”: Evaluation in Aboriginal communities
  • Concurrent session: Where do international ‘evaluation quality standards’ fit in the Australasian evaluation landscape? by Emma Williams
  • Concurrent session: Evaluation is dead. Long live evaluative thinking! by Jess Dart, Lyn Alderman, Duncan Rintoul
  • Concurrent session: Continuous Quality Improvement (CQI): Moving beyond point-in-time evaluation by Catherine Wade
  • Concurrent session: A Multi-student Evaluation Internship: three perspectives by Luke Regan, Ali Radomiljac, Ben Shipp, Rick Cummings
  • Concurrent session: Relationship advice for trial teams integrating qualitative inquiry alongside randomised controlled trials of complex interventions by Clancy Read
  • Day 2 Keynote: The landscape of evaluation theory: Exploring the contributions of Australasian evaluators by John Owen
  • Concurrent session: Evolution of the evaluation approach to the Local Prevention Program by Matt Healey, Manuel Peeters
  • Concurrent session: The landscape of using rubrics as an evaluation-specific methodology in program evaluation by Krystin Martens
  • Concurrent session: Beyond bias: Using new insights to improve evaluation practice by Julia McKenzie
  • Concurrent session: Program Logics: Using them effectively to create a roadmap in complex policy and program landscapes by Karen Edwards, Karen Gardner, Gawaine Powell Davies, Caitlin Francis, Rebecca Jessop, Julia Schulz, Mark Harris
  • AES Fellows’ Forum: Ethical Dilemmas in Evaluation Practice
  • Day 2 Closing Plenary: Balance, color, unity and other perspectives: a journey into the changing landscape in evaluation, Ziad Moussa
  • Day 3 Opening Keynote: The Organisational and Political Landscape for Evidence-informed Decision Making in Government by Kathryn Newcomer
  • Concurrent session: Effective Proactive Evaluation: How Can the Evidence Base Influence the Design of Complex Interventions? by John Owen, Ann Larson,Rick Cummings
  • Concurrent session: Applying design thinking to evaluation planning by Matt Healey, Dan Healy, Robyn Bowden
  • Concurrent session: A practical approach to program evaluation planning in the complex and changing landscape of government by Jenny Crisp
  • Concurrent session: Evaluating complexity and managing complex evaluations by Kate Gilbert, Vanessa Hood, Stefan Kaufman, Jessica Kenway
  • Closing Keynote: The role of evaluative thinking in design by John Gargani

Image credits:

  • Causal Loop Diagram is from Wikipedia.
  • The other images are ones I created, adapting from slides I saw at the conference, or photos that I took.

Footnotes   [ + ]

1. E.g., while Owen used “adoption” to refer to the “adoption” (or choosing) of a program to implement, I’ve seen others use “adoption” to refer to individuals (e.g., to what extend individuals “adopt” (or enact) the part of their program they are intended to enact).
Posted in evaluation, evaluation tools, event notes, methods | Tagged , , , , , , , | Leave a comment

Canadian Evaluation Society 2016 conference recap

I recently spent a week at the Canadian Evaluation Society (CES)’s 2016 national conference in St. John’s, NL. I’ve been to the CES national conference twice before – 2010 in Victoria, BC and 2014 in Ottawa, ON – as well as the CES BC & Yukon chapter’s provincial conference for the past two years, and in all cases I’ve learned a tonne and had a great time. There’s something very special about spending time with people who do the thing you do, so I was glad to have a chance to engage with fellow evaluators, both in the formal workshops and presentations and in the informal networking and socializing times.

I’m one of the program co-chairs for the CES 2017 national conference being held in Vancouver next year, so I thought it was extra important for me to go this year and I certainly saw the conference through a different lens, jotting down notes about conference organization and logistics along with the notes I was taking on content throughout the sessions. I took a tonne of notes, as I generally do, but for this blog posting I’m going to summarize some of my insights, in addition to cataloguing all the sessions that I went to 1There were a lot of presentations being held at every session, so I didn’t get to go to half of the ones that I wanted to, but that seems to be the case with every conference and I’m not sure how any conference organizer could solve that problem, short of recording and posting every session, which would be prohibitively expensive.. So rather than present my notes by session that I went to, I’m going to present them by topic area, and then present the new tools I learned about 2Or old tools that I know of but I haven’t thought about using in an evaluation context.. Where possible 3Where by “possible” I mean, when (a) I wrote down who said something in my notes, and (b) I can read my own writing. I’ve included the names of people who said the brilliant things that I took note of, because I think it is important to give credit where credit is due, but I apologize in advance if my paraphrasing of what people said is not as elegant as when the people actually said them.

Evaluation

There isn’t a single definition of evaluation. Some of the ones mentioned throughout the conference included:

  • Canadian Evaluation Society’s definition: “Evaluation is the systematic assessment of the design, implementation or results of an initiative for the purposes of learning or decision-making.” 4See the source of this for further elaboration on the pieces of this definition
  • Carol Weiss’s definition: “Evaluation is the systematic assessment of the operation and/or outcomes of a program policy, compared to a set of explicit or implicit standards, as a means of contributing to the improvement of the program or policy” 5I googled to get this definition, which was alluded to the in the workshop I went to, and found it on this site.
  • Australasian Evaluation Society’s definition: “systematic collection and analysis of information to make judgements, usually about
    the effectiveness, efficiency and/or appropriateness of an activity […including…] many types of initiatives, not just programs, but any set of procedures, activities, resources, policies and/or strategies designed to achieve some common goals or objectives.” 6Source: AES Ethical Guidelines [pdf]

Evaluative Thinking

  • Emma Williams used a lot of interesting analogies in her workshop on Evaluative Thinking, one of which was the meerkat. People these days are working with their noses to the grindstone – like a meerkat down on all fours running like the wind – but it’s important every so often to make like meerkat, who stops, stands up, and looks around to see what’s going on. We as evaluators can encourage people to do stop, look around, and reflect. I like this image of the meerkat as a reminder of that.
  • Also from Emma Williams: evaluators are like the worst of a 4 year old (always asking “Why? Why? Why?”) and the worst of a skeptical teenager (think: arms folded saying, “That’s what you think! Prove it!”

Evaluation Reasoning

  • Evaluation is about judging the merit or worth of a program. Researchers tend to be uncomfortable with making judgements, whereas that is what evaluators do.
  • Evaluation reasoning involves deciding:
    • what criteria will you use to judge the program
    • what are the standards by which you will be able to decide if it is good enough to judge it as good or not
    • collecting data to make those judgments
    • have a “warranted” argument to link evidence to claims
  • If you have a program theory, use that to develop your criteria and compare your evidence to your theory.

The Evaluand

The Evaluand is the thing that you are evaluating. When you say that “it worked” or “it didn’t work”, the evaluand is the “it”.

  • Evaluating strategy. A strategy is a group of (program and/or policy) interventions that are all meant to work towards a common goal. We don’t learn in evaluation education how to evaluate a strategy. Robert Schwartz gave an interesting talk on this topic – he suggested that strategies are always complex (including, but not limited to, their being multiple interventions, multiple partners, interactions and interactions among those interactions, feedback loops, non-linearity, and subpopulations) and we don’t really have a good way of evaluating all of this stuff. He said he wasn’t even sure it is possible to evaluate strategies “but can we get something from trying?” I thought this was an interesting way to approach the topic and I did think we learned some things from his work.
  • Evaluating complexity. Jonathan Morrell did an interesting expert lecture on this topic 7His slide deck, which was from a longer workshop that he did previously (so he didn’t cover all of this in his lecture) is available here.. Some of the key points I picked up from his talk:
    • Our program theories tend to just show the things that are in the program being evaluated (e.g., inputs, activities), but there are many things around the program that affect it as well, and some of those things we do not and cannot know.
    • We can draw on complexity science (a) instrumentally and (b) metaphorically.
    • Science cares about what is true, while technology cares about what works. If we think of evaluators are technologists (which it seems Morrell does), then he’s in favour of invoking complexity in any way that works (e.g., if using it metaphorically to help us think about our program/situation, then do that and don’t work if you aren’t using “complexity science” as a whole). He notes that “science begins to matter when technology stops working”).
    • Some of the concepts of complexity include:
      • small changes can lead to big changes
      • small changes can cascade through a system
      • there can be unintended outcomes, both positive and negative, of a system
      • attractors – “properties toward which a system evolves, regardless of starting conditions”
    • NetLogo Model Library contains many different models of agent-based social behaviours.
    • We might not even evaluate/measure things that seem “simple” (e.g., if we don’t understand that feedback loops can cause unpredictable things, then we won’t look for or measure things).
    • There is no grand unified theory of complexity – it comes from many roots  8Check out this map of the roots of “complexity” science and it’s a very different way of looking at the world (compared to thinking about things as being more linear (input -> activity ->output)
    • Program designers tend to design simple programs – it’s very hard to align with all the other programs out there that all have their own cultures/process/etc. – would take so long to do that that no one would ever get anything done. (People know that system-level interventions are needed, but they can only do what’s in their scope to do)
    • Implications for evaluation – need to be close to the program to observe small changes, as they can lead to large effects; and because you can’t always predict what feedback loops there may be, you need to be there to observe them.
    • Even if the program doesn’t recognize the complexity of their situation, evaluators can use complexity concepts to make a difference.

Data Collection, Analysis, and Interpretation

  • “Data literacy” = “the ability to understand and use data effectively to information decisions (e.g., why to collect data, how to collect data, how to interpret data, how to turn data into information)”
  • Anytime you expect program staff (as opposed to evaluators) to collect data (or even to participate in things like being observed by an evaluator for data collection), you have to remember that collecting data takes away time and attention from direct service provision. A staff member will think “I can fill out this data collection sheet or I can save another life. You have to make sure that staff understand the importance of the data and what it is going to be used for (e.g., to improve the program or to secure future funding for the program/help insulate the program against potential funding cuts by having evidence that the program is having an effect) if you expect them to put effort towards collecting it.
  • Anyone who is going to be entering data (even if it’s data that’s collected as part of providing service but which will also be used for evaluation) needs to understand the importance of data quality. For example, do staff understand that if they put a “0” when they actually mean that the data is not available, that 0 will erroneously decrease the average you calculate from that data set?
    • Make sure data entry protocols are very clear about what exactly the data collector needs to do and *why* they need to do it, and that you include a data dictionary – you’d be surprised how differently people can interpret things.
  • What the data “says” vs. what the data “means”? It is very possible to misinterpret data, so it’s important to think about your data, your methods, and their limitations. For example, if you have survey data that tells you everyone loves your program, but the survey response rate was 5% or the survey questions were all biased, the data may “say” that everyone loves your program, but it just means that the 5% who responded love your program or that the answers to the biased questions gave you positive results, but you don’t actually know what people thought about your program. Another example: if rates of errors when up after an intervention (what the data says), does it mean that more errors actually occurred, or that the new system is better at detecting errors?
  • Campbell’s Law: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” 9Source
  • Unanticipated consequences – we all talk about them, but few evaluations explicitly included looking for them (including budgeting for and doing the necessary open inquiry on site, which is the only way to get at unintended consequences)
  • Consequential validity – everything we do has consequences. In terms of questions/measures, consequential validity =”the aftereffects and possible social and societal results from a particular assessment or measure. For an assessment to have consequential validity it must not have negative social consequences that seem abnormal. If this occurs it signifies the test isn’t valid and is not measuring things accurately.”10Source – e.g., if an test shows that a subgroup consistently scores lower, it could be the result of the test being biased against them (and thus the test is not validly measuring what it purports to be measuring).

Implementation

  • “Implementation” = “a specific set of activities designed to put into practice an activity or program of known dimensions” (Cunning et al)
  • effective interventions X effective implementation X enabling contexts = socially significant outcomes (that is, you need interventions that work, and you need to implement them well, and the context has to enable that)
  • there is a growing evidence base of ‘what works’ in implementation – we should be evidence-based in our attempts to implement things

Quality Improvement

  • Hana Saab discussed working as an evaluator in a healthcare environment where people tend to make assumptions like: PDSA cycle = evaluation (even though quality improvement projects are rarely rigorously evaluated and reasons for achieving results are often not understood); better knowledge = improved practice (even though there are many steps between someone attending an education session and actually using what they learned in actual practice); that contexts are homogeneous (which they aren’t!). She also noted that sometimes people conclude a program “didn’t work” but don’t differentiate between implementing the program as intended and it didn’t lead to the intended outcomes vs. the program wasn’t even implemented as intended [and, I would add, you could also conclude a program “worked” but it actually worked because they didn’t implement it as intended, but rather adapted it to something that did work (but if you didn’t note that, you’d think the original program design worked), or maybe the program is fine in other contexts, but not in this one].
  • Realist evaluation allows the integration of process & outcome evaluation and focuses on “what works for whom and in what circumstances”

Aboriginal Evaluation

  • The CES has issued a statement on Truth & Reconciliation in response to the Truth & Reconciliation Commission of Canada’s report in which they resolved to:
    • include reconciliation in the existing CES value of “inclusiveness”
    • include reconciliation explicitly in the evaluation competencies
    • strengthen promotion of and support for culturally responsive evaluation,
    • implement consideration for reconciliation in its activities
  • Culturally-responsive evaluation:
    • There are many ways of knowing – Western ways of knowing are privileged, but all ways of knowing have strengths and limitations.
    • It’s important to recognized power differentials and inequities.
    • A bottom up, strength-based approach is advocated
    • The 4 Rs (Kirkness & Barnhardt, 1991):
      • Respect
      • Relevance
      • Reciprocity
      • Responsibility
    • Reciprocal Consulting, who presented this in one of their presentations that I attended, provides a great description of the 4 Rs on their website.

Creativity

  • The opening keynote speaker showed a video clip of an activity, where they had a group of people line up by birthday without talking  People tend to go with the first right answer they find, which is why we end up with incremental improvements, rather than going on to find other right answers, some of which could be truly innovative.
  • We need spaces to be creative. Cubicles and offices and meeting rooms with whiteboard markers that don’t work are not conducive to being spontaneous, to rapid prototyping, or to other ways of being creative. It doesn’t cost that much to set up a creative space – room for people to get together, put up some foam boards or flip chart papers that you can write on or put sticky notes on, have a stock of markers and random prototyping supplies.

Communication

  • “The great enemy of communication, we find, is the illusion of it.” – William H. Whyte 11The keynote speaker, Catherine Courage, had a slide with a similar quote that she attributed to George Bernard Shaw (“The single biggest problem in communication is the illusion that it has taken place.”), but when I just googled to confirm that – because I know that most people don’t actually look for sources of quotes, I found out that George Bernard Shaw never said this. Shame though – I like the wording of the way people think – erroneously – Shaw said it better than the way that Whyte actually did say it. Don’t assume that because you wrote a report and sent it to something that (a) it has be read, or (b) that it has been understood.
  • We make meaning from stories. Stories + stats are far more memorable, and more likely to drive people to action, than stats alone.

The Credential Evaluation designation

The CES created a professional designation program for evaluators – the only one in the world, in fact. Of the 1696 members of CES, 319 people(19%) currently hold this designation 12Full disclosure: I am one of these Credentialed Evaluators., with a further 140 in the process of applying. The society has put a lot of work in creating the designation, getting it off the ground, optimizing the infrastructure to sustain it 13e.g., the process by which you apply, as well as developing educational materials to ensure that CEs have options for education as they have to do a certain number of education hours to maintain their credential. But the CE designation, I learned at this conference, is not without controversy.

  • Kim van der Woerd asked an astute question in a session I was in 14She asked many, in fact, but in this instance I’m talking about a specific one that struck me. on quality assurance for evaluation. The idea being discussed was that one might include “having the evaluator(s) working on the evaluation” as a criteria for a high quality evaluation. Kim pointed out that doing that would privilege and give power to those people holding a CE designation, as well as the ways of knowing and evaluating that are dominant and thus included in the evaluation credentialing process. What about other evaluators? 15I don’t think she mentioned this specifically in that session, but I was thinking about evaluations I’ve seen with community members as part of the evaluation team, where they were specifically included because they had lived experiences and relationships within the community that were invaluable to the project, but they did not have the things deemed necessary by CES to get CE designation. I would think their inclusion in the evaluation would make for a higher quality evaluation than if they had been excluded.

Meta-Evaluation

Meta-evaluation is evaluation of evaluation. How do we know if we are doing good quality evaluations? Moreover, how do we know if our evaluations are making a difference?

    • One study in New Zealand found that only 8 of 30 evaluations they assessed met their criteria for a “good” evaluation. A study in the UK National Audit Office found only 14 of 34 evaluations were sufficient to draw conclusions about the effects of the intervention 16Source.
    • The Alberta Centre for Child, Family, and Community Research is working on a quality assurance framework for evaluation. It’s not done yet, but when it is it will be published on their website, so I’ve made a mental note to go look for it later.
    • We don’t actually have a good evidence base that evaluation makes a difference. A project by Eval Partners contributed to that evidence based by showcasing 8 stories of evaluations where they did truly make a difference. They provided a visual that I found helpful in thinking about this (I’ve recreated the image and annotated it with the key points]

evalautions_making_a_difference

  • 8_ways_to_enhance_evalution_impact
  • One audience member in a presentation I was in used an analogy of auditors for accounting – an auditor doesn’t *do* your accounting for you, but rather they come in an verify that you did your accounting well(according to accounting standards). But imagine if an auditor came in and you hadn’t done any accounting at all! That’s like bringing in an external evaluator to a program and saying “evaluate the program”, but you have not set up anything for evaluation!
  • Meta-evaluation isn’t just something we should do at the end of an evaluation to see if that was a good evaluation we did. You should engage in meta-evaluation throughout the project, while you still have the opportunity to strength the evaluation!

Miscellaneous:

  • Several people referred to Eval Agenda2020, the global agenda for evaluation for 2016-2020, created by EvalPartners.
  • The Canadian Evaluation Society has a new(ish) strategic plan:

  • Context-driven evaluation approach 17Cunning et al – having an overarching evaluation framework and tools (e.g., shared theory of change, outcome measures, reporting structure, database), but with the ability to adapt to local, organizational, & community contexts (as people adapt their programs at local sites)
  • “Deliverology” was the new buzzword this year – it was defined in one presentation as an approach to public services that prioritizes delivering results to citizens. Apparently it’s been talked about a lot in the federal public service.
  • Several people also mentioned that the Treasury Board Secretariat has a new evaluation-related policy on the horizon.
  • In his closing keynote, relating to the conference theme of “Evaluation On the Edge”, Michael Quinn Patton asked the audience to reflection on “Where is your edge?” My immediate thought on this was a reflection I’ve had before – that when I look back on the things I’ve done in my life so far that I thought were really amazing accomplishments – doing a PhD, playing a world record 10-day long game of hockey, doing a part-time MBA while working full-time – I started each one of them feeling “Oh my god! What am I doing? This is too big, too much, I won’t be able to do it!” I felt truly afraid that I’d gotten too close to the edge and was going to fall – not unlike how I felt when I did the CN Tower Edgewalk. But in each case, I’d decided to “feel the fear and do it anyway!” 18That’s the definition of courage, right? and while all of those things were really hard, I did accomplish them and they are some of the best things I’ve ever done. I also remember having that same feeling when I took on  my current job to evaluate a very big, very complex, very important project – “oh my god! It’s too big, it’s too much, what if I can’t figure it out??” But I decided to take the plunge and I think I’m managing to do some important work 19I guess only time will really tell!. I think the lesson here is that we have to push ourselves to the edge – and have the courage to walk there – to make great breakthroughs

Tips and Tools

  • Use the 6 Thinking Hats to promote evaluative thinking. I’ve used this activity in a teaching and learning context, and seen it used in an organization development/change management context, which now that I think of it were examples of evaluative thinking being applied in those contexts. I’ve usually seen it done where the group is split up so that some people are assigned the blue hat perspective, some the red hat perspective, etc., but Emma suggested that the way it is intended to be used is that *everyone* in the group is supposed to use each hat, together in turn.
  • Don’t use evaluation jargon when working with stakeholders. You don’t need to say “logic model” or “program theory” when you can just saw “we are going to draw a diagram that illustrates how you think the program will achieves its goals” or “let’s explain the rationale for the program.” Sometimes people use professional jargon to hide gaps in their knowledge – if you really understand a concept, you should be able to explain it in plain English.
  • Backcasting: Ask stakeholders what claims they would like to be able to make/what they want to “prove” at the end of the evaluation and then work backwards: “What evidence would you need to be able to make that claim?” and then “How would we collect that evidence?”
  • Thinking about “known contribution” vs. “expected contribution” in your program theory. Robert Schwartz talked about this when talking about IPCA for evaluating strategy, but I think this is useful for program logic models as well. I’ve thought about this before, but never actually represented it on any of my logic models.
  • Wilder Collaboration Factors Inventory, a “free tool to assess how your collaboration is doing on 20 research-tested success factors”
  • Adaptation Framework to adapt existing survey tools for use in rural, remote, and Aboriginal communities available from Reciprocal Consulting.
  • Treasure Board Secretariat’s Infobase – “a searchable online database providing financial and human resources information on government operations”
  • “Between Past and Future” By Hannah Arendt – has six exercises on critical thinking.
  • The Mountain of Accountability

Sessions I Presented:

  • Workshop:  Accelerating Your Logic Models: Interactivity for Better Communication  by Beth Snow & Nancy Snow
  • Presentation: Quick wins: The benefits of applying evaluative thinking to project development by M. Elizabeth Snow & Joyce Cheng

Sessions I Attended:

  • Workshop: Building Capacity in Evaluative Thinking (How and Why It is Different from Building Evaluation Capacity) by Emma Williams, Gail Westhorp, & Kim Grey.
  • Keynote Address: Silicon Valley Thinking for Evaluation by Catherine Courage
  • Presentation: Evaluating the Complex with Simulation Modeling by Robert Schwartz
  • Presentation: Blue Marble Evaluators as Change Agents When Complexity is the Norm by Keiko Kuji-Shikatani
  • Presentation: Organizational Evaluation Policy and Quality Assessment Framework: Learning and Leading by Tara Hanson & Eugene Krupa.
  • Presentation: Exemplary Evaluations That Make a Difference by Rachel Zorzi
  • CES Annual General Meeting
  • Presentation: Evaluation: Pushing the boundaries between implementing and sustaining evidence-based practices and quality improvement in health care by Sandra Cunning et al.
  • Presentation: Indigenous Evaluation: Time to re-think our edge by Benoit Gauthier, Kim van der Woerd, Larry Bremner.
  • Presentation: Drawing on Complexity to do Hands-on Evaluation by Jonathan Morell.
  • Presentation: Navigating the Unchartered Waters of INAC?s Performance Story: Where program outcomes meet community impacts by Shannon Townsend & Keren Gottfried.
  • Presentation: Supporting decision-making through performance and evaluation data by Kathy Gerber & Donna Keough.
  • Presentation: Utilizing Change Management and Evaluation Theory to Advance Patient Safety by Hana Saab; Rita Damignani
  • Closing Keynote: The Future: Beyond here there be dragons. Or are those just icebergs? by Michael Quinn Patton

Image credits:

 

Footnotes   [ + ]

1. There were a lot of presentations being held at every session, so I didn’t get to go to half of the ones that I wanted to, but that seems to be the case with every conference and I’m not sure how any conference organizer could solve that problem, short of recording and posting every session, which would be prohibitively expensive.
2. Or old tools that I know of but I haven’t thought about using in an evaluation context.
3. Where by “possible” I mean, when (a) I wrote down who said something in my notes, and (b) I can read my own writing.
4. See the source of this for further elaboration on the pieces of this definition
5. I googled to get this definition, which was alluded to the in the workshop I went to, and found it on this site.
6. Source: AES Ethical Guidelines [pdf]
7. His slide deck, which was from a longer workshop that he did previously (so he didn’t cover all of this in his lecture) is available here.
8. Check out this map of the roots of “complexity” science
9. Source
10. Source
11. The keynote speaker, Catherine Courage, had a slide with a similar quote that she attributed to George Bernard Shaw (“The single biggest problem in communication is the illusion that it has taken place.”), but when I just googled to confirm that – because I know that most people don’t actually look for sources of quotes, I found out that George Bernard Shaw never said this. Shame though – I like the wording of the way people think – erroneously – Shaw said it better than the way that Whyte actually did say it
12. Full disclosure: I am one of these Credentialed Evaluators.
13. e.g., the process by which you apply, as well as developing educational materials to ensure that CEs have options for education as they have to do a certain number of education hours to maintain their credential
14. She asked many, in fact, but in this instance I’m talking about a specific one that struck me.
15. I don’t think she mentioned this specifically in that session, but I was thinking about evaluations I’ve seen with community members as part of the evaluation team, where they were specifically included because they had lived experiences and relationships within the community that were invaluable to the project, but they did not have the things deemed necessary by CES to get CE designation. I would think their inclusion in the evaluation would make for a higher quality evaluation than if they had been excluded.
16. Source.
17. Cunning et al
18. That’s the definition of courage, right?
19. I guess only time will really tell!
Posted in evaluation, evaluation tools, event notes, notes | Tagged , , , , , | Leave a comment

One week until the 2016 Canadian Evaluation Society conference

One week from today, I’ll be on the opposite side of Canada, attending the Canadian Evaluation Society’s 2016 conference.

I’m doing presenting in two sessions at the conference: one pre-conference workshop and one conference presentation.

On June 5, my sister and I are giving a workshop based on a project we’ve been working on:

Accelerating Your Logic Models: Interactivity for Better Communication by Beth Snow and Nancy Snow

Logic models are commonly used by evaluators to illustrate relationships among a program’s inputs, activities, outputs, and outcomes. They are useful in helping intended users develop programs, communicate a program’s theory of change, and design evaluations. However, a static logic model often does not allow us to convey the complexity of the interrelationships or explore the potential effects of altering components of the model.

In this workshop, we will explore and create interactive logic models that will allow you to more easily demonstrate the logic within a complex model and to explore visually the implications of changes within the model. In addition, participants will be introduced to information design principles that can make their logic models – even complex ones – easier for intended users to understand and use.

Bring a logic model of your own that you would like to work on or work with one of ours to get some hands on practice at accelerating your logic model.

You will learn:

  • to create an interactive logic model in a virtual environment

  • to speak and write in a more informative way about the visual representations in your logic models

  • to apply information design-based principles when generating logic models

On June 6, I’ll be giving a presentation based on my main project at work:

Quick wins: The benefits of applying evaluative thinking to project development by Beth Snow and Joyce Cheng

The Clinical & Systems Transformation (CST) project aims to transform healthcare in Vancouver by standardizing clinical practice and creating a shared clinical information system across 3 health organizations. Ultimately, the system will be used by 40000 users at 40 hospitals, residential care homes, etc. The project includes an evaluation team tasked with answering the question “Once implemented, does CST achieve what it set out to achieve?” By being engaged early in the project, the evaluation team has been able to use evaluative thinking and evaluation tools to influence non-evaluators to advance the project, long before “the evaluation” itself is implemented. This presentation will explore the ways in which the early work of the evaluation team has influenced the development of the project — including facilitating leadership to articulate goals and helping the project use those goals to guide decisions — at the levels of individuals, project subteams, and the project as a whole.

There’s still time to register if you are interested!

Posted in evaluation, event notes | Tagged , , , , , , | Leave a comment

Reflections from the Canadian Evaluation Society 2014 Conference

I had the good fortune of being able to the attend the 35th annual conference of the Canadian Evaluation Society that was held at the Ottawa Convention Centre from June 16-18, 2014. I’d only been to one CES conference previously, when it was held in Victoria, BC in 2010, and I was excited to be able to attend this one as I really enjoyed the Victoria conference, both for the things I learned and for connections I was able to make. This year’s conference proved to be just as fruitful and enjoyable as the Victoria one and I hope that I’ll be able to attend this conference more regularly in the future.

Disappointingly, the conference did not have wifi in the conference rooms, which made the idea of lugging my laptop around with me less than appealing (I’d been intending to tweet and possibly live blog some sessions, but without wifi, my only option would have been my phone and it’s just not that easy to type that much on my phone). So I ended up taking my notes the old fashioned way – in my little red notebook – and will just be posting highlights and post conference reflections here 1Which, in truth, is probably better for any blog readers than the giant detailed notes that would have ended up here otherwise!.

Some of the themes that came up in the conference – based on my experience of the particular sessions that I attended, were:

  • The professionalization of evaluation. The Canadian Evaluation Society has a keen interest in promoting evaluation as a profession and has created a professional designation called the “Credentialed Evaluator” which allows individuals with a minimum of two years of full-time work in evaluation and at least a Master’s degree, to complete a rigorous process of self-reflection and documentation to demonstrate that they meet the competencies necessary to be an evaluator. Upon doing so, one is entitled to put the letters “CE” after their name. Having this designation distinguishes you as qualified to do the work of evaluation – as otherwise, anyone can call themselves an evaluator – and so it can help employers and agencies wishing to hire evaluators to identify competent individuals. I am proud to say that I hold this designation – one of only ~250 people in the world at this point. At the conference there was much talk about the profession of evaluation – in terms of CES’s pride that they created the first – and practically only 2Apparently there is a very small program for evaluation credentialing in Japan, but it’s much smaller than the Canadian one. – of this type of designation in the world, as well as distinguishing between research and evaluation 3Which is a very hot topic that leads to many debates, which I’ve experienced both at this conference and elsewhere..
  • Evidence-based decision making as opposed to opinion-based policy making or “we’ve always done it this way” decision making 4Or, as a cynical colleague of mine once remarked she was involved in: decision-based evidence making.. This brought up topics such as: the nature of knowledge, what constitutes “good” or “appropriate” evidence, the fallacy of the hierarchy of evidence 5Briefly, there is a hierarchy of evidence pyramid that is often inappropriately cited as being an absolute – that higher levels of the hierarchy are absolutely and in all cases better than lower levels – as opposed to the idea that the “best” evidence depends on the question being asked, not to mention the quality of the specific studies (e.g., a poorly done RCT is not the same as a properly done RCT). I’ve also had this debate more than once..
  • Supply side and demand side of evaluation.The consensus I saw was that Canada is pretty good at the supply side – evaluators and providing professional development for them – but could stand to do more work on the demand side – getting more decision makers to understand the importance of evaluations and the evidence they can provide to improve decision making.
  • “Accountability vs. Learning” vs. “Accountability for Learning”. One of the purposes for evaluation is accountability – to demonstrate to funders/decision makers/the public that a program is doing what it is intended to do. Another purpose is to learn about the program, with the goal of, for example, improving the program. But some of the speakers at the conference talked about re-framing this to be about programs being “accountable for learning”. A program manager should be accountable for noticing when things aren’t working and for doing something about it.
  • If you don’t do it explicitly, you’ll do it implicitly.This came up for me in a couple of sessions. First, in a thematic breakfast where we were discussing Alkin & Christie’s “evaluation theory tree” , which categorizes evaluation theories under “use,” “methods” or “valuing”, we talked about how each main branch was just an entry point, but all three areas still occur. For example, if you are focused on “use” when you design your evaluation (as I typically do), you still have to use methods and there are still values at play. The question is, will you explicitly consider those (e.g., ask multiple stakeholder groups what they see as the important outcomes, to get at different values) or will you not (e.g., you just use the outcomes of interest to the funder, thereby only considering their values and not those of the providers or the service recipients)? The risk, then, is that if you don’t pay attention to the other categories, you will miss out on opportunities to make your evaluations stronger. The second time this theme came up for me was in a session distinguishing evaluation approach, design, and methods. The presenter was from the Treasury Board Secretariat who evaluated evaluations conducted in government and noted that many discussed approach and methods, but not design. They still had a design, of course, but without having explicitly considered it, they could easily fall into the trap of assuming that a given approach must use a particular design and discounted the possibility of other designs that might have been better for the evaluation. “Rigourous thinking about how we do evaluations leads to rigourous evaluations.”

One of the sessions that particularly resonated for me was “Evaluating Integration: An Innovative Approach to Complex Program Change.” This session discussed the Integrated Care for Complex Patients (ICCP) program – an initiative focused on integrating healthcare services provided by multiple healthcare provider types across multiple organizations, focused on providing seamless care to those with complex care needs. The project was remarkably similar to one that I worked on – with remarkably similar findings. Watching this session inspired me to get writing, as I think my project is worth publishing.

As an evaluator who takes a utilization-focused approach to evaluation (i.e., I’m doing an evaluation for a specific purpose(s) and I expect the findings to be used for that purpose(s)), I think it’s important to have a number of tools in my tool kit so that when I work on an evaluation I have at my finger tips a number of options of how best to address a given evaluation’s purpose. At the very least, I want to know about as many methods and tools as possible – their purposes, strengths, weaknesses, and the basic idea of what it takes to use the method or tool, as I can always learn about the specifics of how to do it when I get to a situation where a given method would be useful. At this year’s conference, I learned about some new methods and tools, including:

  • Tools for communities to assess themselves:
    • Community Report Cards: a collaborative way for communities to assess themselves 6The presentation from the conference isn’t currently available online – some, but not all presenters, submitted their slide decks to the conference organizers for posting online – but here’s a link to the general idea of community report cards. The presentation I saw focused on building report cards in collaboration with the community..
    • The Fire Tool: a culturally grounded tool for remote Aboriginal communities in Australia to assess and identify their communities’ strengths, weaknesses, services and policies. 7Again, the presentation slide deck isn’t online, but I found this link to another conference presentation by the same group which describes the “fire tool”, in case anyone is interested in checking it out..
  • Tools for Surveying Low Literacy/Illiterate Communities:
    • Facilitated Written Survey: Options are read aloud, respondents provide their answer on an answer sheet that has either very simple words (e.g., Yes, No) or pictures (e.g., frowny face, neutral face, smiley face) on it that they can circle or mark a dot beside. You may have to teach the respondents what the simple words or pictures mean (e.g., in another culture, a smiley face may be meaningless).
    • Pocket Chart Voting: Options are illustrated (ideally photos) and pockets are provided to allow each person to put their vote into the pocket (so it’s anonymous). If you want to disaggregate the votes by, say, demographics, you can give different coloured voting papers to people from different groups.
  • Logic Model That Allows You To Dig Into the Arrows: the presenters didn’t actually call it that, but since they didn’t give it a name, I’m using that for now. In passing, some presenters from the MasterCard Foundation noted that they create logic models where each of the arrows – which represent the “if, then” logic in logic model is clickable and when you click it, it takes you to a page summarizing the evidence that supports the logic for that link. It’s a huge pet peeve for me that so many people create lists of activities, outputs, and outcomes with no links whatsoever between then and call that a logic model – you can’t have a logic model without any logic represented in it, imho. One where you actually summarize the evidence for the link would certainly hammer home the importance of the logic needing to be there. Plus it would be a good way to test out if you are just making assumptions as you create your logic model, or if there is good evidence on which to base those links.
  • Research Ethics Boards (REB) and Developmental Evaluation (DE). One group noted that when they submitted a research ethics application for a developmental evaluation project, they addressed the challenge that REB’s generally want a list of focus group/interview/survey questions upfront, but DE is emergent. To do this, they created a proposal with a very detailed explanation of what DE is and why it is important, and then creating very detailed hypothetical scenarios and how they would shape the questions in those scenarios (e.g., if participants in the initial focus groups brought up X, we would then ask questions like Y and Z). This allowed the reviewers to have a sense of what DE could look like and how the evaluators would do things.
  • Reporting Cube.Print out key findings on a card stock cube, which you can put on decision makers desk. A bit of an attention getting way of disseminating your findings!
  • Integrated Evaluation Framework[LOOK THIS UP! PAGE 20 OF MY NOTEBOOK]
  • Social Return on Investment (SROI) is about considering not just the cost of a program (or the cost savings you can generate), but to focus on the value created by it – including social, environmental, and economic. It seemed very similar to Cost-Benefit Analysis (CBA) to me, so I need to go learn more about this!
  • Rapid Impact Evaluation: I need to read more about this, as the presentation provided an overview of the process, which involves expert and technical groups providing estimates of the probability and magnitude of effects, but I didn’t feel like I really got enough out of the presentation to see how this was more than just people’s opinions about what might happen. There was talk about the method having high reliability and validity, but I didn’t feel I had enough information about the process to see how they were calculating that.
  • Program Logic for Evaluation Itself  Evaluation —> Changes Made —> Improved Outcomes. We usually ask “did the recommendations get implemented?”, but need to ask “if yes, what effect did that have? Did it make things better?” (and more challengingly, “Did it make things better compared to what would have happened had we not done the evaluation?”)

A few other fun tidbits:

  • A fun quote on bias: “All who drink of this remedy recover in a short time, except those whom it does not help, who all die. Therefore, it is obvious that it fails only in incurable cases.” -Galen, ancient Greek physician
  • Keynote speaker Dan Gardiner mentioned that confidence is rewarded and doubt is punished (e.g., people are more likely to vote for someone who makes a confident declaration than one who discusses nuances, etc.). An audience member asked what he thought about this from a gender lens, as men are more often willing to confidently state something than women. Gardener’s reply was that we know that people are overconfident (e.g., when people say they are 100% sure, they are usually right about 70-80% of the time), so whenever he hears people say “What’s wrong with women? How can we make them be more confident”, he thinks “How can we make men be less confident?”
  • a great presentation from someone from the Treasure Board Secretariat provided a nice distinction between:
    • evaluation approach: high-lelve conceptual model use din undertaking evaluation in light of evaluation objectives (e.g., summative, formative, utilization-focused, goal-free, goal-based, theory-based, participatory) – not mutually exclusive (you can use more than one approach)
    • evaluation design: tactic for systematically gathering data that will assist evaluators in answering evaluation questions
    • evaluation methods: actual techniques used to gather & analyze data (e.g., survey, interview, document review)
approach strategic evaluation objectives
design tactical evaluation questions
methods operational evaluation data
  • In addition to asking “are we doing the program well?”, we should also ask “are we doing the right thing?” Relevance is a question that the Treasury Board seems to focus on, but I think I haven’t given it much thought. Something to consider more explicitly in future evaluations.
  • Ask not just “how can I make my evaluations more useful?” but also, “how can I make them more influential?”
  • In a presentation on Developmental Evaluation, the presenter showed a diagram something like this (I drew it in my notebook and have now reproduced it for this blog), which I really liked as a visual:

through feedback loops
It shows how we are always making decisions on what actions to take based on a combination of knowledge and beliefs (we can never know *everything*), but we can test out our beliefs, feed that back in, and repeat, and over time we’ll be basing our actions more on evidence and less on beliefs

Footnotes   [ + ]

1. Which, in truth, is probably better for any blog readers than the giant detailed notes that would have ended up here otherwise!
2. Apparently there is a very small program for evaluation credentialing in Japan, but it’s much smaller than the Canadian one.
3. Which is a very hot topic that leads to many debates, which I’ve experienced both at this conference and elsewhere.
4. Or, as a cynical colleague of mine once remarked she was involved in: decision-based evidence making.
5. Briefly, there is a hierarchy of evidence pyramid that is often inappropriately cited as being an absolute – that higher levels of the hierarchy are absolutely and in all cases better than lower levels – as opposed to the idea that the “best” evidence depends on the question being asked, not to mention the quality of the specific studies (e.g., a poorly done RCT is not the same as a properly done RCT). I’ve also had this debate more than once.
6. The presentation from the conference isn’t currently available online – some, but not all presenters, submitted their slide decks to the conference organizers for posting online – but here’s a link to the general idea of community report cards. The presentation I saw focused on building report cards in collaboration with the community.
7. Again, the presentation slide deck isn’t online, but I found this link to another conference presentation by the same group which describes the “fire tool”, in case anyone is interested in checking it out.
Posted in evaluation, evaluation tools, event notes, methods, notes | Tagged , , , | Leave a comment

Intro to Philosophy – Week 7 – Time Travel

  • this module focused on the paradoxes of time travel and some ways to defend the logical possibility of backwards time travel (mostly from a David Lewis paper)
  • time travel involves:
    • external time = “time as it is registered by the world at large” – e.g., movement of times, rotation of the Earth; “time as it is registered by the majority of the non-time-travelling universe”
    • personal time = ” time as it is registered by a particular person or a particular travelling object” – e.g., your hair greying, the accumulation of your digestive products
    • normally, external time = personal time
    • but for time travel, the two diverge
  • forward time travel – “external time and personal time share the same direction, but have different measures of duration
  • backward time travel – “external time and personal time diverge in direction” and duration (in that you are travelling, for example, -50 years of external time while personal time goes forward)
  • Einstein’s Special Theory of Relativity says that if you travel fast enough, forward time travel does occur (because of time dilation)
  • backward time travel is more speculative – it’s debated whether physics supports the notion of backward time travel, though the General Theory of Relative “seems to predict that under certain circumstances” (e.g., enormous mass, enormous speed of mass) “it is possible to create circumstances where personal time and external time direct in duration and direction)
  • Lewis provides an argument that backward time travel is “logically possible” – not that it is physically possible
  • the grandfather paradox is basically that backward time travel is not possible because:
    • “if it was possible to travel in time it would be possible to create contradictions.
    • it is not possible to create contradictions.
    • Therefore, it is not possible to travel backwards in time”
  • e.g., if you  could travel backwards in time, you could kill your grandfather before they father your parent, which would prevent you from ever being born, but if you didn’t exist, how could you go back in time to kill your grandfather?
  • another example, you can’t go back into time to kill Hitler in 1908 because you already know that Hitler lived until 1945, so if you did travel into the past, you are guaranteed not to succeed in killing Hitler. So your actions in the past are restricted, but that’s not the same as saying it’s impossible you traveled back in time
  • Lewis agrees that contradictions can’t occur, but argues that time travel need not necessarily create contradictions
  • compossibility: possible relative to one set of facts may not be possible relative to another set of facts.
    • e.g., it’s compossible that I speak Gaelic, in the sense that I have a functioning voice box, but I can’t actually speak it because I’ve never learned it
  • so it’s “compossible” to kill Hitler in the past (he was mortal, I am physically capable of shooting a gun), but relative to the fact that Hitler was alive in 1945, it’s not “possible” for him to be killed in 1908
  • two senses of change:
    • replacement change: e.g., if I knock a glass off a table, I’d replace whole glass with a pile of glass fragments
    • counterfactual change: “the impact that you have assessed in terms of what would have happened (counterfactually) if you hadn’t been present”
      • e.g., my alarm clock going off this morning changed the course of my day (relative to if it hadn’t gone off)
  • Lewis thinks replacement changes can happen to concrete objects, but not to time
  • he also says that time travellers could cause a counterfactual change – i.e., the time traveller can affect things in the past (compared to if they hadn’t been there) – they don’t cause a replacement chance (i.e., it’s not like the past happened one way and then it changed to another way – it always happened only one way
  • causal loops are “a chain of events such that an event is among its own causes”; they aren’t paradoxes, but they do “pose a problem for the intelligibility of backward time travel”
  • e.g., imagine you travel back in time with a 2012 copy of Shakespeare’s complete works and give them to the young Shakespeare, who then claims them as his own – well, the only reason the 2012 copy exists is because you gave it to Shakespeare – but who wrote it? where did the information in it come from?
  • [or could become your grandfather by sleeping with your grandmother in the past, but you could only do that if you existed and you couldn’t exist unless you’d fathered your own parent, which could couldn’t do if you didn’t exist first.]
  • Lewis agrees that causal loops are strange, but they aren’t impossible
  • there are 3 possible chains of events:
    • infinite linear chain: ever event has a prior cause, so you can never get an answer of what the first cause was because you can always ask “but what caused that?”
    • finite linear chain: the first event int he chain has no cause – e.g., the Big Bang wasn’t just the first event in time, it was the beginning of time – no time existed before that (As Hawking says, asking “hwaht happened before the Big Bang is like asking “what’s north of the north pole?”) – so you still have the problem of “where does the information come from?”
    • finite non-linear chain: (causal loops) – again, we still have no explanation of where the information originally came from, but it’s no more problematic than the other two
  • there are other questions that philosophers think about with respect to time travel:
    • how can you bilocate? i.e., how can you from the future be standing next to you from the present
    • what physical laws govern time travel?
  • there’s also the idea of branching histories – you could go back to the past and kill Hitler, but you’d have killed Hitler in one version of history but in the version of history where you came from still had a Hitler who lived until 1945 (which raises the question: is this really time travel if you traveled to what is really a different history?)
  • another “interesting question is whether the mechanisms from time travel that general relativity may permit, and the time travel mechanisms that quantum mechanics may permit, will survive the fusion of general relativity and quantum mechanics into quantum gravity”
  • Hawking has posed another challenge to the “realistic possibility of time travel” – if time travel is possible, where are all the time travellers? Why haven’t we seen them?
  • “closed time-like curve is a path through space and time that returns to the very point whence it departed, but that nowhere exceeds the local speed of light. It’s a pathway that a physically possible object could take, that leads backward in time.” – it’s debated if this is realistic
  • but if it’s true, you could only access history once a closed time-like curve has been generated (e.g., if it is generated in 2017, then people in the future can travel back only as far as 2017)- so perhaps we haven’t seen time travellers yet because no one has yet generated a closed time-like curve
Posted in notes, online module notes, philosophy | Tagged , , , , | Leave a comment

Intro to Philosophy – Week 6 – Are Scientific Theories True?

  • this module was not what I expected – I was expecting to learn about the philosophy of science (e.g., positivism, post-positivism, etc.), but instead the whole module was about the debate between scientific realism vs. scientific anti-realism – a debate about the aims of science (rather than a debate on a specific scientific topic)
  • two main aims of science seem to be:
    • science should be accurate and provide us with a good description and analysis of the available experimental evidence in a given field of inquiry. We want “scientific theories to save the phenomenon”
    • science is not just about providing an accurate account of the available experimental evidence  and to save the phenomena, but to “tell us a story about those phenomena, how they came about , what sort of mechanisms are involved in the product of the experimental evidence, etc.
    • [I don’t fully understand what “save the phenomena” means – the instructor in the lecture just says it like we should understand it. Some further elaboration was given in the elaboration on the quiz that appeared in the first lecture, where the course instructor wrote that “saving the phenomena” is also known as “saving the appearances”: providing a good analysis of scientific phenomena as they appear to us, without any commitment to the truth of what brings about those phenomena or appearances” ]
  • Ptolemic astronomers described the motion of planets as being along small circles that were rotating along larger circles; they didn’t necessarily believe this to be what was actually happening, but rather it was a mathematical contrivance that “saved the phenomena” – that is, as long as the calculations agree with observations, it didn’t matter if they were true (or even likely)
  • Galileo, however, “replaced the view that science has to save the appearances, with the view that science should in fact tell us a true story about nature”
  • scientific realism: = “view that scientific theories once literally construed, aims to give us a literally true story of the way the world is.”
    • a semantic aspect to this idea: “once literally construed” means that “we should assume that the terms of our theory have referents in the external world” (e.g., planets are planets. Electrons are electrons.)
    • an epistemic aspect to this idea: “literally true story” – “we should believe that our best scientific theories are true, namely that whatever they say about the world, or better about those objects which are the referents of their terms, is true, or at least approximately true”
  • the “No Miracles Argument” suggests that unless we believe that scientific theories are at least approximately true, the success” of science at “making predictions later confirmed by observation, explaining phenomena, etc.” would be very unlikely
  • constructive empiricism” agrees with the semantic aspect of scientific realism (i.e., we should take the language of science at face value), but disagrees with scientific realism with the epistemic aspect (i.e., it thinks that a theory does not need to be good to be true). They think “Models must only be adequate to the observable phenomena, they are useful tools to get calculations done, but they do not deliver any truth about the unobservable entities” (e.g., atoms, protons, etc. that we cannot observe with the naked eye) – so the theory does not need to be “true” – it just needs to be “empirically adequate”. They think that science is successful because the theories that survive turned out to be the “fittest” (survival of the fittest) – the ones that best “saved the phenomena” over time.
  • Constructive empiricists view the “metaphysical commitment” necessary for scientific realism to be “risky”. If we discover later that something in our theory was non-existent, it would make scientific realism wrong, but not constructive empiricism.
  • The scientific realist would counter that the theories that survive do so because they are true (and those that fail do so because they are false).
  • Another issue is the distinction between observed vs. unobserved. E.g., observing with the “naked eye” and observing with scientific instruments. Why should we believe one more than the other?
  •  Philip Kitcher and Peter Lipton say that “we are justified to believe in atoms, electrons, DNA and other unobservable entities because the inferential path that leads to such entities is not different from the inferential path that leads to unobserved observables”
  • e.g., we know about dinosaurs from fossil evidence – we didn’t observe the dinosaurs ourselves, but can infer from the fossils. Similarly, we can infer Higgs Bosons from the evidence we get from the Large Hadron Collider.
  • “inference to the best explanation” = “we infer the hypothesis which would, if true, provide the best explanation of the available evidence”
Posted in Uncategorized | Tagged , , , , , | Leave a comment