I recently spent a week at the Canadian Evaluation Society (CES)’s 2016 national conference in St. John’s, NL. I’ve been to the CES national conference twice before – 2010 in Victoria, BC and 2014 in Ottawa, ON – as well as the CES BC & Yukon chapter’s provincial conference for the past two years, and in all cases I’ve learned a tonne and had a great time. There’s something very special about spending time with people who do the thing you do, so I was glad to have a chance to engage with fellow evaluators, both in the formal workshops and presentations and in the informal networking and socializing times.
I’m one of the program co-chairs for the CES 2017 national conference being held in Vancouver next year, so I thought it was extra important for me to go this year and I certainly saw the conference through a different lens, jotting down notes about conference organization and logistics along with the notes I was taking on content throughout the sessions. I took a tonne of notes, as I generally do, but for this blog posting I’m going to summarize some of my insights, in addition to cataloguing all the sessions that I went to . So rather than present my notes by session that I went to, I’m going to present them by topic area, and then present the new tools I learned about . Where possible I’ve included the names of people who said the brilliant things that I took note of, because I think it is important to give credit where credit is due, but I apologize in advance if my paraphrasing of what people said is not as elegant as when the people actually said them.
There isn’t a single definition of evaluation. Some of the ones mentioned throughout the conference included:
- Canadian Evaluation Society’s definition: “Evaluation is the systematic assessment of the design, implementation or results of an initiative for the purposes of learning or decision-making.”
- Carol Weiss’s definition: “Evaluation is the systematic assessment of the operation and/or outcomes of a program policy, compared to a set of explicit or implicit standards, as a means of contributing to the improvement of the program or policy”
- Australasian Evaluation Society’s definition: “systematic collection and analysis of information to make judgements, usually about
the effectiveness, efficiency and/or appropriateness of an activity […including…] many types of initiatives, not just programs, but any set of procedures, activities, resources, policies and/or strategies designed to achieve some common goals or objectives.”
- Emma Williams used a lot of interesting analogies in her workshop on Evaluative Thinking, one of which was the meerkat. People these days are working with their noses to the grindstone – like a meerkat down on all fours running like the wind – but it’s important every so often to make like meerkat, who stops, stands up, and looks around to see what’s going on. We as evaluators can encourage people to do stop, look around, and reflect. I like this image of the meerkat as a reminder of that.
- Also from Emma Williams: evaluators are like the worst of a 4 year old (always asking “Why? Why? Why?”) and the worst of a skeptical teenager (think: arms folded saying, “That’s what you think! Prove it!”
- Evaluation is about judging the merit or worth of a program. Researchers tend to be uncomfortable with making judgements, whereas that is what evaluators do.
- Evaluation reasoning involves deciding:
- what criteria will you use to judge the program
- what are the standards by which you will be able to decide if it is good enough to judge it as good or not
- collecting data to make those judgments
- have a “warranted” argument to link evidence to claims
- If you have a program theory, use that to develop your criteria and compare your evidence to your theory.
The Evaluand is the thing that you are evaluating. When you say that “it worked” or “it didn’t work”, the evaluand is the “it”.
- Evaluating strategy. A strategy is a group of (program and/or policy) interventions that are all meant to work towards a common goal. We don’t learn in evaluation education how to evaluate a strategy. Robert Schwartz gave an interesting talk on this topic – he suggested that strategies are always complex (including, but not limited to, their being multiple interventions, multiple partners, interactions and interactions among those interactions, feedback loops, non-linearity, and subpopulations) and we don’t really have a good way of evaluating all of this stuff. He said he wasn’t even sure it is possible to evaluate strategies “but can we get something from trying?” I thought this was an interesting way to approach the topic and I did think we learned some things from his work.
- Evaluating complexity. Jonathan Morrell did an interesting expert lecture on this topic . Some of the key points I picked up from his talk:
- Our program theories tend to just show the things that are in the program being evaluated (e.g., inputs, activities), but there are many things around the program that affect it as well, and some of those things we do not and cannot know.
- We can draw on complexity science (a) instrumentally and (b) metaphorically.
- Science cares about what is true, while technology cares about what works. If we think of evaluators are technologists (which it seems Morrell does), then he’s in favour of invoking complexity in any way that works (e.g., if using it metaphorically to help us think about our program/situation, then do that and don’t work if you aren’t using “complexity science” as a whole). He notes that “science begins to matter when technology stops working”).
- Some of the concepts of complexity include:
- small changes can lead to big changes
- small changes can cascade through a system
- there can be unintended outcomes, both positive and negative, of a system
- attractors – “properties toward which a system evolves, regardless of starting conditions”
- NetLogo Model Library contains many different models of agent-based social behaviours.
- We might not even evaluate/measure things that seem “simple” (e.g., if we don’t understand that feedback loops can cause unpredictable things, then we won’t look for or measure things).
- There is no grand unified theory of complexity – it comes from many roots and it’s a very different way of looking at the world (compared to thinking about things as being more linear (input -> activity ->output)
- Program designers tend to design simple programs – it’s very hard to align with all the other programs out there that all have their own cultures/process/etc. – would take so long to do that that no one would ever get anything done. (People know that system-level interventions are needed, but they can only do what’s in their scope to do)
- Implications for evaluation – need to be close to the program to observe small changes, as they can lead to large effects; and because you can’t always predict what feedback loops there may be, you need to be there to observe them.
- Even if the program doesn’t recognize the complexity of their situation, evaluators can use complexity concepts to make a difference.
Data Collection, Analysis, and Interpretation
- “Data literacy” = “the ability to understand and use data effectively to information decisions (e.g., why to collect data, how to collect data, how to interpret data, how to turn data into information)”
- Anytime you expect program staff (as opposed to evaluators) to collect data (or even to participate in things like being observed by an evaluator for data collection), you have to remember that collecting data takes away time and attention from direct service provision. A staff member will think “I can fill out this data collection sheet or I can save another life. You have to make sure that staff understand the importance of the data and what it is going to be used for (e.g., to improve the program or to secure future funding for the program/help insulate the program against potential funding cuts by having evidence that the program is having an effect) if you expect them to put effort towards collecting it.
- Anyone who is going to be entering data (even if it’s data that’s collected as part of providing service but which will also be used for evaluation) needs to understand the importance of data quality. For example, do staff understand that if they put a “0” when they actually mean that the data is not available, that 0 will erroneously decrease the average you calculate from that data set?
- Make sure data entry protocols are very clear about what exactly the data collector needs to do and *why* they need to do it, and that you include a data dictionary – you’d be surprised how differently people can interpret things.
- What the data “says” vs. what the data “means”? It is very possible to misinterpret data, so it’s important to think about your data, your methods, and their limitations. For example, if you have survey data that tells you everyone loves your program, but the survey response rate was 5% or the survey questions were all biased, the data may “say” that everyone loves your program, but it just means that the 5% who responded love your program or that the answers to the biased questions gave you positive results, but you don’t actually know what people thought about your program. Another example: if rates of errors when up after an intervention (what the data says), does it mean that more errors actually occurred, or that the new system is better at detecting errors?
- Campbell’s Law: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”
- Unanticipated consequences – we all talk about them, but few evaluations explicitly included looking for them (including budgeting for and doing the necessary open inquiry on site, which is the only way to get at unintended consequences)
- Consequential validity – everything we do has consequences. In terms of questions/measures, consequential validity =”the aftereffects and possible social and societal results from a particular assessment or measure. For an assessment to have consequential validity it must not have negative social consequences that seem abnormal. If this occurs it signifies the test isn’t valid and is not measuring things accurately.” – e.g., if an test shows that a subgroup consistently scores lower, it could be the result of the test being biased against them (and thus the test is not validly measuring what it purports to be measuring).
- “Implementation” = “a specific set of activities designed to put into practice an activity or program of known dimensions” (Cunning et al)
- effective interventions X effective implementation X enabling contexts = socially significant outcomes (that is, you need interventions that work, and you need to implement them well, and the context has to enable that)
- there is a growing evidence base of ‘what works’ in implementation – we should be evidence-based in our attempts to implement things
- Hana Saab discussed working as an evaluator in a healthcare environment where people tend to make assumptions like: PDSA cycle = evaluation (even though quality improvement projects are rarely rigorously evaluated and reasons for achieving results are often not understood); better knowledge = improved practice (even though there are many steps between someone attending an education session and actually using what they learned in actual practice); that contexts are homogeneous (which they aren’t!). She also noted that sometimes people conclude a program “didn’t work” but don’t differentiate between implementing the program as intended and it didn’t lead to the intended outcomes vs. the program wasn’t even implemented as intended [and, I would add, you could also conclude a program “worked” but it actually worked because they didn’t implement it as intended, but rather adapted it to something that did work (but if you didn’t note that, you’d think the original program design worked), or maybe the program is fine in other contexts, but not in this one].
- Realist evaluation allows the integration of process & outcome evaluation and focuses on “what works for whom and in what circumstances”
- The CES has issued a statement on Truth & Reconciliation in response to the Truth & Reconciliation Commission of Canada’s report in which they resolved to:
- include reconciliation in the existing CES value of “inclusiveness”
- include reconciliation explicitly in the evaluation competencies
- strengthen promotion of and support for culturally responsive evaluation,
- implement consideration for reconciliation in its activities
- Culturally-responsive evaluation:
- There are many ways of knowing – Western ways of knowing are privileged, but all ways of knowing have strengths and limitations.
- It’s important to recognized power differentials and inequities.
- A bottom up, strength-based approach is advocated
- The 4 Rs (Kirkness & Barnhardt, 1991):
- Reciprocal Consulting, who presented this in one of their presentations that I attended, provides a great description of the 4 Rs on their website.
- The opening keynote speaker showed a video clip of an activity, where they had a group of people line up by birthday without talking People tend to go with the first right answer they find, which is why we end up with incremental improvements, rather than going on to find other right answers, some of which could be truly innovative.
- We need spaces to be creative. Cubicles and offices and meeting rooms with whiteboard markers that don’t work are not conducive to being spontaneous, to rapid prototyping, or to other ways of being creative. It doesn’t cost that much to set up a creative space – room for people to get together, put up some foam boards or flip chart papers that you can write on or put sticky notes on, have a stock of markers and random prototyping supplies.
- “The great enemy of communication, we find, is the illusion of it.” – William H. Whyte . Don’t assume that because you wrote a report and sent it to something that (a) it has be read, or (b) that it has been understood.
- We make meaning from stories. Stories + stats are far more memorable, and more likely to drive people to action, than stats alone.
The Credential Evaluation designation
The CES created a professional designation program for evaluators – the only one in the world, in fact. Of the 1696 members of CES, 319 people(19%) currently hold this designation , with a further 140 in the process of applying. The society has put a lot of work in creating the designation, getting it off the ground, optimizing the infrastructure to sustain it . But the CE designation, I learned at this conference, is not without controversy.
- Kim van der Woerd asked an astute question in a session I was in on quality assurance for evaluation. The idea being discussed was that one might include “having the evaluator(s) working on the evaluation” as a criteria for a high quality evaluation. Kim pointed out that doing that would privilege and give power to those people holding a CE designation, as well as the ways of knowing and evaluating that are dominant and thus included in the evaluation credentialing process. What about other evaluators?
Meta-evaluation is evaluation of evaluation. How do we know if we are doing good quality evaluations? Moreover, how do we know if our evaluations are making a difference?
- One study in New Zealand found that only 8 of 30 evaluations they assessed met their criteria for a “good” evaluation. A study in the UK National Audit Office found only 14 of 34 evaluations were sufficient to draw conclusions about the effects of the intervention
- The Alberta Centre for Child, Family, and Community Research is working on a quality assurance framework for evaluation. It’s not done yet, but when it is it will be published on their website, so I’ve made a mental note to go look for it later.
- We don’t actually have a good evidence base that evaluation makes a difference. A project by Eval Partners contributed to that evidence based by showcasing 8 stories of evaluations where they did truly make a difference. They provided a visual that I found helpful in thinking about this (I’ve recreated the image and annotated it with the key points]
- One audience member in a presentation I was in used an analogy of auditors for accounting – an auditor doesn’t *do* your accounting for you, but rather they come in an verify that you did your accounting well(according to accounting standards). But imagine if an auditor came in and you hadn’t done any accounting at all! That’s like bringing in an external evaluator to a program and saying “evaluate the program”, but you have not set up anything for evaluation!
- Meta-evaluation isn’t just something we should do at the end of an evaluation to see if that was a good evaluation we did. You should engage in meta-evaluation throughout the project, while you still have the opportunity to strength the evaluation!
- Several people referred to Eval Agenda2020, the global agenda for evaluation for 2016-2020, created by EvalPartners.
- The Canadian Evaluation Society has a new(ish) strategic plan:
- Context-driven evaluation approach – having an overarching evaluation framework and tools (e.g., shared theory of change, outcome measures, reporting structure, database), but with the ability to adapt to local, organizational, & community contexts (as people adapt their programs at local sites)
- “Deliverology” was the new buzzword this year – it was defined in one presentation as an approach to public services that prioritizes delivering results to citizens. Apparently it’s been talked about a lot in the federal public service.
- Several people also mentioned that the Treasury Board Secretariat has a new evaluation-related policy on the horizon.
- In his closing keynote, relating to the conference theme of “Evaluation On the Edge”, Michael Quinn Patton asked the audience to reflection on “Where is your edge?” My immediate thought on this was a reflection I’ve had before – that when I look back on the things I’ve done in my life so far that I thought were really amazing accomplishments – doing a PhD, playing a world record 10-day long game of hockey, doing a part-time MBA while working full-time – I started each one of them feeling “Oh my god! What am I doing? This is too big, too much, I won’t be able to do it!” I felt truly afraid that I’d gotten too close to the edge and was going to fall – not unlike how I felt when I did the CN Tower Edgewalk. But in each case, I’d decided to “feel the fear and do it anyway!” and while all of those things were really hard, I did accomplish them and they are some of the best things I’ve ever done. I also remember having that same feeling when I took on my current job to evaluate a very big, very complex, very important project – “oh my god! It’s too big, it’s too much, what if I can’t figure it out??” But I decided to take the plunge and I think I’m managing to do some important work . I think the lesson here is that we have to push ourselves to the edge – and have the courage to walk there – to make great breakthroughs
Tips and Tools
- Use the 6 Thinking Hats to promote evaluative thinking. I’ve used this activity in a teaching and learning context, and seen it used in an organization development/change management context, which now that I think of it were examples of evaluative thinking being applied in those contexts. I’ve usually seen it done where the group is split up so that some people are assigned the blue hat perspective, some the red hat perspective, etc., but Emma suggested that the way it is intended to be used is that *everyone* in the group is supposed to use each hat, together in turn.
- Don’t use evaluation jargon when working with stakeholders. You don’t need to say “logic model” or “program theory” when you can just saw “we are going to draw a diagram that illustrates how you think the program will achieves its goals” or “let’s explain the rationale for the program.” Sometimes people use professional jargon to hide gaps in their knowledge – if you really understand a concept, you should be able to explain it in plain English.
- Backcasting: Ask stakeholders what claims they would like to be able to make/what they want to “prove” at the end of the evaluation and then work backwards: “What evidence would you need to be able to make that claim?” and then “How would we collect that evidence?”
- Thinking about “known contribution” vs. “expected contribution” in your program theory. Robert Schwartz talked about this when talking about IPCA for evaluating strategy, but I think this is useful for program logic models as well. I’ve thought about this before, but never actually represented it on any of my logic models.
- Wilder Collaboration Factors Inventory, a “free tool to assess how your collaboration is doing on 20 research-tested success factors”
- Adaptation Framework to adapt existing survey tools for use in rural, remote, and Aboriginal communities available from Reciprocal Consulting.
- Treasure Board Secretariat’s Infobase – “a searchable online database providing financial and human resources information on government operations”
- “Between Past and Future” By Hannah Arendt – has six exercises on critical thinking.
- The Mountain of Accountability
Sessions I Presented:
Workshop: Accelerating Your Logic Models: Interactivity for Better Communication by Beth Snow & Nancy Snow
Presentation: Quick wins: The benefits of applying evaluative thinking to project development by M. Elizabeth Snow & Joyce Cheng
Sessions I Attended:
Workshop: Building Capacity in Evaluative Thinking (How and Why It is Different from Building Evaluation Capacity) by Emma Williams, Gail Westhorp, & Kim Grey.
Keynote Address: Silicon Valley Thinking for Evaluation by Catherine Courage
Presentation: Evaluating the Complex with Simulation Modeling by Robert Schwartz
Presentation: Blue Marble Evaluators as Change Agents When Complexity is the Norm by Keiko Kuji-Shikatani
Presentation: Organizational Evaluation Policy and Quality Assessment Framework: Learning and Leading by Tara Hanson & Eugene Krupa.
Presentation: Exemplary Evaluations That Make a Difference by Rachel Zorzi
CES Annual General Meeting
Presentation: Evaluation: Pushing the boundaries between implementing and sustaining evidence-based practices and quality improvement in health care by Sandra Cunning et al.
Presentation: Indigenous Evaluation: Time to re-think our edge by Benoit Gauthier, Kim van der Woerd, Larry Bremner.
Presentation: Drawing on Complexity to do Hands-on Evaluation by Jonathan Morell.
Presentation: Navigating the Unchartered Waters of INAC?s Performance Story: Where program outcomes meet community impacts by Shannon Townsend & Keren Gottfried.
Presentation: Supporting decision-making through performance and evaluation data by Kathy Gerber & Donna Keough.
Presentation: Utilizing Change Management and Evaluation Theory to Advance Patient Safety by Hana Saab; Rita Damignani
Closing Keynote: The Future: Beyond here there be dragons. Or are those just icebergs? by Michael Quinn Patton
The other images are ones I created, adapting from slides I saw at the conference.