what is the point of this blog?


I recently had coffee with a new friend and fellow evaluator, Meagan Sutton. We were introduced by a mutual friend who knew that Meagan was interested in chatting with evaluators who write blogs and that I am an evaluator who writes a blog! We had a great chat and it got me thinking about why I have this blog and how I might grow what I do with it.


I originally started this blog as a place to keep notes of work-related stuff I was reading. I have a pretty terrible memory and I find my personal blog a great way to remember stuff that I did – it’s easy to search through and accessible anywhere with an Internet connection – so I figured rather than having notes in various notebooks and jotted down in the margins of printed copies of journal articles, I could use this blog as my brain dump for various things I learn 1I briefly co-opted this blog for blog postings I was required to do during an Internet marketing class that I took in my MBA, but then switched it back to stuff related to my work.. So whenever I went to a conference, attended a webinar, or read a book or article where I wanted to record what I was learning, I dumped it on this blog. I am an external processor, so it helps me to remember and understand things when I write them down. For webinars I tend to take notes directly into my blog and publish that, but for conferences, I usually write notes on paper during the conference – partially because that helps keep me awake and attentive during conference sessions and partially because I don’t like lugging my laptop around during a conference – but also because I find it helpful to look at all the notes I’ve taken and sort or synthesize them together for the whole conference and if type my notes during the conference, I find it harder to remove the superfluous stuff, whereas if I’m deciding what it’s worth typing out from a bunch of handwritten notes, I find it easier to be more succinct as I’ll select just the main points to blog about. The downside is that it often takes me quite a while to do that, and I can end up posting my conference summary blog posting many months later 2Though I made it a priority to do it more quickly from the last conference I attended and actually got it posted just two weeks after the conference instead of months and months later.

Meagan asked me how I promote this blog and honestly, I don’t. Since I saw the blog as mostly just an externalization of my memory, I didn’t think anyone else would ever want to read it. I have had a few people contact me after reading something on my blog that they found through Google – and actually have had some interesting conversations result – but it’s pretty rare.

Occasionally, I add some reflection into these blog postings – like thoughts about how what I was reading or learning at a conference might relate to work that I do, but that’s been pretty minimal.


At the same time, I’ve been working on improving my reflective practice, mostly through reflective writing that I’m doing privately rather than in a public forum like this. Part of that is because the reflections I’ve been writing are part of the data I am using in the evaluation I’m working on, so I need it documented where the rest of the data (including my team’s reflections) are. And part of it is because some of what I write about is confidential or politically sensitive, so is not for sharing publicly.

And this is where blogging as an evaluator can get sticky. Sometimes there are things you want to reflect on and process, and maybe even start a conversation with fellow evaluators about, but that you aren’t able to make anonymous for discussion in a public forum. Or you have conflicts with clients that you want to reflect on, but can’t do that publicly either. How does one navigate this? I honestly don’t know the answer, but as I think about expanding this blog to become more reflective, it’s something I’ll need to think more about.

I guess the flip side of this is: why do I want to put my reflections out into the world? I guess because I see it as an opportunity to engage with others. As I mentioned above, without even sharing my blog postings beyond just posting them here, I’ve had some interesting interactions with other evaluators who stumbled on my blog – imagine what could happen if I tweeted out these blog postings (like I do my personal blog postings with my personal Twitter account) and actually wrote some reflective stuff – things I’m thinking about/struggling with/wanting to know more about? Perhaps I could connect with others facing similar issues and get different perspectives on the things I’m thinking about.

Image credits:

  • Coffee – posted on Flickr by Jen with a Creative Commons license
  • Blog – posted on Flickr by Xiaobin Liu with a Creative Commons license
  • Twisty Water Looking Thing – posted on Flickr by Mario with a Creative Commons license
  • Megaphone – from Pixabay by OpenClipart-Vectors with a free for commercial use license

Footnotes   [ + ]

1. I briefly co-opted this blog for blog postings I was required to do during an Internet marketing class that I took in my MBA, but then switched it back to stuff related to my work.
2. Though I made it a priority to do it more quickly from the last conference I attended and actually got it posted just two weeks after the conference instead of months and months later.
Posted in evaluation, reflection | Tagged , , | 1 Comment

Webinar Notes: Shifting Mental Models to Advance Systems Change

Title: Shifting Mental Models to Advance Systems Change

Offered by: FSGNew Profit, and the Collective Impact Forum.

-Tammy Heinz, Program Officer, Hogg Foundation
-Hayling Price, Senior Consultant, FSG
-Darrell Scott, Founder, PushBlack
-Julie Sweetland, Vice President for Strategy and Innovation, Frameworks Institute
-Rick Ybarra, Program Officer, Hogg Foundation

Hayling Price:

  • “Systems change is about shifting conditions that are holding a problem in place”
  • “It’s not about getting more young people to beat the odds. It’s about changign the odds”
  • 6 conditions of systems change
    • structural change (policies, practices, resource flows (who gets funding and why? how are human resources allocated) [explicit – easiest to find and to change]
    • relationships & connections (not just having someone on your LinkedIn, but actually engaging), power dynamics (who is getting funded and why? some people have a leg up, some people are dealing with a history of oppression) [semi-explicit]
    • transformative change (mental models) [implicit]
  • mental models: deeply held beliefs, assumptions, etc.
  • the policies, practice, resource flows are not handed to us by nature – they are created by humans based on our mental models

Darrell Scott

  • PushBlack – nation’s largest nonprofit media platform for Black people
  • 4 millions subscribers with emotionally-driven stories about Black history, culture, and current events
  • through Facebook Messenger – meeting people where they are at
  • Go to Facebook Messenger and search “PushBlack” to sign up!
  • ran the largest get-out-the-vote campaign on social media in history in 2018
    • got subscribers to contact their friends (relates to relationships and connections part of the conditions of system change)
  • giving subscribers tools to work at the local level (e.g., to be heard when Black people are killed by police, to free innocent Black people)
  • test their messages with small subset of audience before sending out only the best performing messages to the broader audience)

Julie Sweetland

  • uses the phrase “cultural models”, which is similar concept from anthropology
  • “cultural models are cognitive short cuts created through years of experience and expectation. They are largely automatic assumptions, and can be implicit”
  • People rely on cultural models to interpret, organize and make meaning out of all sorts of stimuli, from daily experiences to social issues”
  • believe that understanding mental/cultural models helps you to understand what are the mental models that are holding a problem in place
  • e.g., Google image search “ocean” and the top hits are pictures of “beautiful blue expanse” – this is a mental model that Americans hold of the ocean – this holds implications for policy:
    • people think it is so big, that it’s invincible
    • people think it’s water and think about the surface – not thinking about what’s underneath, about how it’s an ecosystem, it produces oxygen, it affects weather, etc.
  • it’s not that the ocean isn’t blue or isn’t big, but that’s just a piece of the picture
  • e.g., some people’s mental model of “teenager”, is about “risk and rebellion” – people defying expectations from adults. Again, not a complete picture.
  • 3 models are consistently barriers to productive conversations on social issues (especially in American context, but they’ve also seen them internationally):
    • individualism: assumption that problems, solutions, and consequences happen at the personal level
    • us vs. them: assumption that another social group is distinct, different, and problematic (beyond people – can be human vs. animals; environment vs. economy)
    • fatalism: assumption that social problems are too big, too bad, or too difficult to fix
  • there are also mental models that are specific to a given situation, but the above three tend to show up in lots of areas
  • one thing that doesn’t work: correcting their mistakes
    • “myth busters” – they don’t work! A study of myth-fact structure found: people misremembered the myths as true, got worse over time, and they attributed the false information to the CDC (Skumik et al (2005), JAMA)
    • mental models are there because we’ve heard it so many times. When you restate a “bad” mental model, you reinforce it (e.g., if you state: Myth: Flu vaccines cause the flu, you reinforce their mental model that flu vaccines cause the flu (doesn’t matter that you said it was a “myth”))
    • never remind people of things you wish they’d forget
  • another thing that doesn’t work: giving people more information
    • isn’t not that you shouldn’t use facts
    • but if people have a particular mental model, stacking data on top does not change their mental model
    • you need to help them build a new mental model
  • another thing that doesn’t work: leaving causation to the public imagination doesn’t work
    • leaving people with their bad mental models won’t help
  • instead of trying to rebut people’s misunderstanding – try to redirect attention to what is true and how things do work

Tammy Heinz and Rick Ybarra

  • Hogg Foundation for Mental Health
  • historically funded lots of program and research
  • Mental Health has been focused on diagnosis and treatments, with end goal of symptom reduction
  • now moving their work upstream
  • traditionally, there has been a medical/disease model of health
  • in the 1970s, people started thinking about if mental health was really chronic or could people get better from this
  • shifting a mental model is not something that can happen quickly
  • in the past 20 years, there’s been some deliberate work to shift the thinking around mental health
  • huge shift towards peers helping in mental health care teams
  • thinking about “recovery” – it’s not an expectation of only symptom control


  • there are multiple mental models on an issue – you can call up a more productive mental model (e.g., maybe “fatalism” if the first thing that comes to mind, but you can call up a more productive mental model)
  • how do you figure out what mental models people are using?
    • Hayling: we are constantly testing out models through our work
    • Julie: ask people “what are ideas you wish you’d never hear again?” and you’ll get a pretty good idea of the mental models that are being a problem
  • how do you change mental models around emotionally charged issues?
    • Rick: listening. Figure out what mental models are driving things. Really learn and understand where people are coming from.
    • Tammy: being clear about where you want to go
    • Hayling: make things plain
    • Julie: call people in rather than calling them out

Update: Here’s a link to the recording of the webinar.

Posted in event notes, notes, webinar notes | Tagged , , , , | Leave a comment

Recap of the 2019 Canadian Evaluation Society conference

This year’s conference was in Halifax and, as always, it was a wonderful opportunity to reconnect with my evaluation friends, make some wonderful new friends, to pause and reflect on my practice, and to learn a thing or two. And I think this is quite possibly the fastest I’ve ever put together my post-conference recap here on ye old blog! (The conference ended on May 29 and I’m posting this on June 14!)

Student Case Competition

The highlight of the conference for me this year was the Student Case Competition finals. In this competition, student teams from around the country, each coached by an experienced evaluator, compete in round 1 where they have 5 hours to review a case (typically a nonprofit organization or program) and then complete an evaluation plan for that program. Judges review all the submissions and the top 3 teams from round 1 move on to the finals, where they get to compete live at the conference. They are given a different case and have 5 hours to come up with a plan, which they then present to an audience of conference goers, including representatives from the organization and three judges. After all three teams present, the judges deliberate and a winning team is announced!

I had the honour of coaching a team of amazing students from Simon Fraser University. The competition rules do not allow teams to talk to their coaches when they are actually working on the cases, so my role was to work with them before the round, talking about strategies for approaching the work, as well as chatting with them about evaluation in general. Most of the students on the team had not yet taken an evaluation course, so I also provided some resources that I use when I teach evaluation.

I will admit that I was a bit nervous watching the presentations – not because I didn’t think my team would do well, as I know they worked really hard and are all exceptionally intelligent, enthusiastic and passionate, but because it’s huge challenge to come up with a solid evaluation plan and a presentation in such a short period of time, and because they were competing among the best in the country!

But I need not have been worried. They came up with such a well thought through, appropriate to the organization, and professional plan and presented it with all the enthusiasm, professionalism, grace, and passion that I have come to know they possess. I was definitely one proud evaluation mama watching my team do that presentation and so very, very proud of them when they won! Congratulations to Kathy, Damien, Stephanie, Manal, and Cassandra! And to Dasha, who was part of the team that won round 1, but wasn’t able to join us in Halifax for the finals.

Kudos also go to the two other teams who competed in the finals – students from École nationale d’administration publique (ENAP) and Memorial University of Newfoundland (MUN). Great competitors and, as I had the pleasure of learning when we all went out to the pub afterwards, as well as chatting at the kitchen party the next night, all very lovely people!

Conference Learnings

As usual, I took a tonne of notes throughout the conference and, as usual for my post-conference recaps, I will:

  • summarize some of my insights, by topic (in alphabetical order) rather than by session as I went to some different sessions that covered similar things
  • where possible, include the names of people who said the brilliant things that I took note of, because I think it is important to give credit where credit is due. Sometimes I missed names (e.g., if an audience member asked a question or made a statement, as audience members don’t always state their name or I don’t catch it)
  • apologize in advance if my paraphrasing of what people said is not as elegant as the way that people actually said them.

Anything in [square brackets] is my thoughts that I’ve added upon reflection on what the presenter was talking about.

Federal Government

  • every time I go to CES, I find I learn a little bit more about how the federal government works (since so many evaluators work there!). This time I learned that Canada Revenue Agency (CRA) doesn’t report up to Treasury Board – they report to Finance

Indigenous Evaluation

  • the conference was held on Mi’kma’ki, the ancestral and unceded territory of the Mi’kmaq People.
  • the indigenous welcome to the conference was fantastic and it was given by a man named Jude. I didn’t catch his full name and I couldn’t find his name in the conference program or on Twitter. [Note to self: I need to do better at catching and remembering names so I can properly give credit where credit is due]. He talked about how racism, sexism, ableism, transphobia, and other forms of oppression are at play in the world today. He also talk about about how there is a difference between guilt and responsibility. We need to take responsibility for making things better now, not just feel guilty about the way things are.
  • Nan Wehipeihana talked about an evaluation of sports participation program and how they moved from sports participation “by” Māori  to sports participation “as” Māori. They talked about what it would look like to participate “as” Māori (e.g., using
    Māori language, Māori structures (tribal, subtribal, kin groups) are embedded in the activity, activities occur in places that are meaningful to Māori people (e.g., kayaking on our rivers, activities on our mountains). Developed a rubric in the shape of a five-point star (took a year to develop).
  • I went to a Lightning Roundtable session hosted by Larry Bremner, Nicole Bowmanm, and Andrealisa Belizer where they were leading a discussion on Connecting to Reconciliation through our Profession and Practice. One of the things that Larry mentioned that struck me was the importance of not just indigenous approaches to evaluation, but indigenous approaches to program development. It doesn’t make sense to design a program without indigenous communities as equal partners and then to say you are going to take an indigenous approach to evaluation – the horse has left the barn by that point.
  • They also talked about how evaluators are culpable for the harm that is still happening because we haven’t done right in our work. They talked about how the CES needs to keep the government’s feet to the fire on the Truth and Reconciliation Commission’s (TRC) Calls to Action. Really, after there Commission, there should have been a TRC implementation committee who could go around the country and help get the Calls to Action implemented (Larry Bremner).
  • They talked about not only what can CES do at the national level, but what can we do at the chapter level. As the president of one of the chapters, this is something I need to reflect on and speak to the council about. I also need to revisit the Truth and Reconciliation Commission’s Calls to Action (as it was a while ago that I read that report) and read “Reclaiming Power and Place: The Final Report of the National Inquiry into Missing and Murdered Indigenous Women and Girls“, which was released the week after the CES national conference.
  • I also went to a concurrent session where the panelists were discussing the TRC Calls to action. They pointed out that CBC has a website where they are tracking progress on the 94 Calls to Action: Beyond 94.
  • CES added a competency about indigenous evaluation in its recent updating of the CES competencies:
    • 3.7 Uses evaluation processes and practices that support reconciliation and build stronger relationships among Indigenous and non-Indigenous peoples.
  • Many evaluators saw this new competency and said “I don’t work with indigenous populations, so how can I relate to this competency?” [I will admit, I had that thought as well when the new competencies were announced. Not that I don’t think this is an important competency for evaluators to have – but more that I didn’t know how to apply it in the work I am currently doing or where to start in figuring out what I should do.]. The CES is trying to provide examples to support evaluators. (Linda Lee) E.g.:
Presentation slide
  • I also learned that EvalIndigenous is open to indigenous and non-indigenous people – anyone who wants to move forward indigenous worldviews and want indigenous communities to have control of their own evaluations. So I joined their Facebook group! (Nicole Bowman and Larry Bremner)
  • Evaluators typically use a Western European approach and many use an “extractive” evaluation process, where they take stuff out of the community and leave (I can’t remember if this slide was from Larry Bremner or Linda Lee).
Presentation slide
  • I also found this discussion of indigenous self-identification helpful (Larry Bremner):
Presentation slide
  • There is still so much work to do and so much harm being inflicted on indigenous people:
    • there are more indigenous kids in care today than were in residential schools – this is the new residential schools. (Larry Bremner)
    • During the discussion with the audience, some audience members mentioned “trauma tourism” – that it can be re-traumatizing for indigenous people to share traumas they have experienced and non-indigenous people, in their attempts to learn more about the experiences of indigenous people need to be mindful of this and not further burden indigenous people.
    • If you google “indigenous women”, all the results you get are about missing and murdered indigenous women and girls. Where is the focus on the strengths in the community?


  • evaluators are learners (Barrington)
  • Bloom’s Taxonomy is a hierarchy of cognitive processes that we go through when we do an evaluation – notice that evaluation is at the top – it’s the hardest part
    (Gail Barrington)

Bloom taxonomy.jpg
By Xristina laOwn work, CC BY-SA 3.0, Link

  • single loop learning is where you repeat the same process over and over again, without every questioning the problem you are trying to fix (sort of like the PDSA cycle). There’s no room for growth or transformation. (Gail Barrington)

By Xjent03Own work, CC BY-SA 3.0, Link

  • in contrast, double loop learning allows you to question if you are really tackling the correct problem (sometimes the way that the problem is defined is causing problems/making things difficult to solve) and the decision making rules you are using, allowing for innovation/transformation/growth. (Gail Barrington)

By Xjent03 – Own work, CC BY-SA 3.0, Link

Pattern Matching

  • “Pattern matching is the underlying logic of theory-based evaluation” – specify a theory, collected data based on that, see if they match (Sebastian Lemire)
  • Trochim wrote about both verification AND falsification, but in practice most people just come up with a theory and try to find evidence to support it (confirmation bias)
    (Sebastian Lemire)
  • humans are wired to see patterns, even when they aren’t there and we tend to focus on evidence in support of the patterns (Sebastian Lemire)
  • having more data is not the solution! (Sebastian Lemire)
    • e.g., when people were given more information on horses and then made bet, they didn’t get any more accurate in their bets, but they did get more confident in their bets
  • evaluators need to do reflective practice – e.g., to look for our biases (Sebastian Lemire)
  • structural analytic techniques (see slide) below – not a recipe, but a structure process (Sebastian Lemire)
Presentation slide
  • pay attention to alternative explanations – in the context of comissioned evaluations, it can be hard to get commissioners to agree to you spending time on looking at alternative explanations and we often go into an evaluation assuming that the program is the cause (bias) (Sebastian Lemire)
  • falsification: specify what data you would expect to see if your hypothesis was wrong
    (Sebastian Lemire)

Power and Privilege

  • since we have under-served, under-represented, and under-privileged people, we must also have over-served, over-represented, and over-privileged people (Jude, who gave the indigenous welcome. I didn’t catch his last name and I can’t find it on the conference website)
  • recognize your power and privilege, recognize your biases and think about where they come from and work to prevent your biases from affecting your work
    (Jude, who gave the indigenous welcome. I didn’t catch his last name and I can’t find it on the conference website)
  • and speaking of power and privilege, the opening plenary on the Tuesday morning was a manel. For the uninitiated, a “manel” is a panel of speakers who are all male. It’s an example of bias – men being more often recognized as experts and given a platform as experts when there are many, many qualified women. I called it out on Twitter:
  • a friend of mine who is a male re-tweeted this saying he was glad to see that someone called it out and when I spoke to him later, he told me that people were giving him kudos for calling it out and he had to point out that it was actually a woman who called it out. So another great example of women being made invisible and men getting credit.
  • I do regret, however, that I neglected to point out that it was a “white manel” specifically. There’s so much more to diversity than just “men” and “women”!

Realist Evaluation

  • Michelle Naimi (who I know from the BC evaluation scene) gave a great presentation on a realist evaluation project she’s been working on related to violence prevention training in emergency departments. My notes on realist evaluation don’t do it justice, but I think my main learning here is that this is an approach that I can learn more about. I’m definitely inviting her as a guest speaker the next time I teach evaluation!
Michelle Naimi gives a presentation at the Canadian Evaluation Society conference

Reflective Practice

  • I took a pre-conference workshop, led by Gail Barrington, on reflective practice. This is an area that I’ve identified that I want to improve in my own work and life, and a pre-conference workshop where I got to learn some techniques and actually try them out seemed like a perfect opportunity for professional development.
  • Gail talked about:
    • how she doesn’t see her work and her self as separate – they are seamless
    • if you don’t record your thoughts, they don’t endure. (How many great ideas have you had and lost?) [I’d add, how many great ideas have you had, forgotten about, and then been reminded of later when you read something you wrote?]
    • evaluators are always serving others – we need to take care of ourselves too
  • The best part of the workshop was that we got to try out some techniques for reflective practice as we learned them

Warm up activity: In this activity, we took a few minutes to answer the following questions:

-Who am I?
-What do I hope to get out of this workshop?
-To get the most out of this workshop, I need to ____

Then we re-read what we wrote and answered this:

-As I read this, I am aware that __________

  • and that is an example of reflection!
  • [Just had an idea! I could use that at the start of class to introduce the notion of reflective practice from the beginning of class. If I turn my class into more of a flipped classroom approach, I could have more in-class time to do fun, experiential things like this than listening to lecture 🙂 ]
Resistance Exercise: Another quick writing exercise:

-What are the personal barriers that hold me back from reflection?
-What are the lifestyle/family barriers that hold me back from reflection?
-What barriers at work are holding me back from being transformative?

Then we re-read what you wrote and answer this:

-As I read this, I am aware that __________
The Morning Pages:

Write three pages of stream of consciousness first thing in the morning in a journal that you like writing in. Before you’ve done anything else – and before your inner critic has woken up. If you can’t think of anything to write, just write “I can’t think of anything to write” over and over again until something comes to you.

All sorts of things will pop up – might be ideas for a project you are working on, or “to do” items to add to your list. You can annotate in margins, transfer things to your main to do list later, or some of it might not be useful to you now and you don’t have to look at it again.
  • Gail said it’s very different writing first ting in the morning compared to later in the day. I know that I’m unlikely to get up an extra half hour earlier than I already do, but I could give this a try on weekend morning when I’m not feeling rushed to get to work to see if it’s different for me too.
Start Now Activity:

-The thoughts/ideas that prevent me from journaling now ____

Then re-read what you wrote and answered this:

-As I read this, I am aware that __________
  • for some people, writing is not for them. An alternative is using a voice memo app. We gave it a try in the workshop and I was kind of meh on it, but I used it two more times during the conference when I had a quick thought I wanted to capture. I think the challenge will be that if I want to retrieve those ideas, I’ll need to listen to the recordings, which seems like a big time sync, depending on how much I say (as I can be verbose).
  • we also talked about meditation and went out on a meditative walk ((Gail put up the quotation “solvitur ambulando”, citing St. Augustine, and noting that it is Latin for “solved by walking”. But when I just googled it, it turns out that it was actually from the philosopher Diogenes, and actually refers to something that is solved by a practical experiment). For our walk, we set an intention (to think about one thing that I’ll chnage at my work), then forget about it and go for a mindful walk – paying attention to the sensations of walking (e.g., the feeling of your feet on the ground as you step, the colours and shapes and sounds and smells you encounter). It was a rainy day, but I was definitely struck with all the beauty around me, and was reminded about how beneficial mindfulness can be.
  • My take home from all my reflections in this workshop was:
    • taking time to do things like reflective practice and mindfulness meditation is a choice. I say that I don’t have enough time to do these things, but it’s actually that I have been choosing not to spend my time doing these things. There are a variety of reasons for those choices (which I did reflect on and got some valuable insights about). Remembering that this is a choice – and being more mindful of what choices I’m making – is going to be my intention as I return back to work after my conference/holiday.


  • I’ve been to sessions on Rubrics by Kate McKegg, Nan Wehipeihana, and their colleagues at a number of conferences and I always learn useful things. This year was no exception. The stuff in this section is all from McKegg and Wehipeihana (and they had a couple of collaborators who weren’t there but “presented” via video.
  • rubrics are a way to make our evaluation reason explicit
  • just evaluating on if goals are met is not enough. Rubrics can help us with situations like:
    • what counts as “meeting targets”? (e.g., what if you meet an unimportant target but don’t meet an important one? Or you way exceed one target and miss another by a little bit? etc.)
    • what if you meet targets but there are some large unintended negative consequences?
    • do the ends justify the means? (what if you meet targets but only but doing unethical things?)
    • whose values do you use?
  • 3 core parts of a rubric:
    • criteria (e.g., reach of a program, educational outcomes, etc.)
    • levels (standards) (e.g., bad, poor, good, excellent; could also include “harmful”)
      • some people don’t like to see “harmful” as a level, but e.g., when we saw inequities, we needed a way to be able to say that it was beyond poor and actually causing harm
    • importance of each criteria (e.g., weighting)
      • sometimes all criteria are equally important and sometimes not
  • rubrics can be used to evaluate emerging strategies:
    • evaluation can be used in situations of complexity to track evolving understanding
Presentation slide
  • in all systems change, there is no final “there”
    • in situations of complexity, cause-and-effect are only really coherent in retrospect [this are not predictable] and do not necessarily repeat
    • we only know things in hindsight and our knowledge is only partial – we must be humble
    • need to be looking out continually for what emerges
  • in complexity thinking, we are only starting to see what indigenous communities have long known
    • our reality if created in relation, interpretive
    • Western knowledge dismissed this
Presentation slide
  • need to bring things together to make sense of multiple lines of evidence
    • “weaving diverse strands of evidence together” in the sensemaking process
  • we have to make judgments and decisions about what to do next with limited/patchy information. Rubrics give us a traceable method to make our reasoning explicit
  • having agreed on values at the start helps to navigate complexity
  • break-even analysis flips return-on-investment:
Presentation slide
  • when you can’t do a full cost-benefit analysis (e.g., don’t have information on ALL costs and ALL benefits), can see if the benefits are at least greater than costs
  • think about how rubrics are presented – e.g., minirubrics with red/yellow/green
Presentation slide
  • but that might not be appropriate in some contexts – e.g., if a program is just developing an it’s unreasonable to expect that certain criteria would be at a good level yet
  • a growing flower as a metaphor for different stages of different parts of a program may be more appropriate to a development program. May also be more appropriate in an indigenous context
Presentation slide
  • it’s important to talk about how the criteria relate to each other (not in isolation)
    • they do each analysis separately (e.g., analyze the survey; analyze the interviews)
    • then map that to the rubric
    • then take that to the stakeholders for sensemaking; stakehodlers can help you understand why you saw what you saw (e.g., when you see what might seem like conflicting results)
  • like with other evaluation stuff, might not say “we are building a rubric” to stakeholders at the start (it’s jargon). Instead, ask questions like “what is important to you?” or “If you were participating *as* Māori , what would that look/sound/feel like to you?”

Theory of Change

  • to be a theory of change (TOC) requires a “causal explanation” (i.e., a logic model on its own is not a TOC – we need to talk about why those arrows would lead to those outcomes) (John Mayne) [This also came up as a question to my case competition team – and my team gave a great answer! Did I mention I’m so proud of them?]
  • complexity affects the notion of causation – in complexity, there isn’t “a” cause, there are many causes (John Mayne)
  • people assume you have to have a TOC that can fit on one page – but that doesn’t always work – can do nested TOCs (John Mayne)
  • interventions are aimed at changing the behaviour of groups/institutions, so TOCs should reflect that (John Mayne)
    • there is lots of research on behaviour change, such as on Bennett’s hierarchy, or the COM-B model (John Mayne):
Presentation slide
  • causal link assumptions – what conditions are needed for that link to work? (John Mayne) (e.g., could label the arrows on a logic model with these assumptions – Andrew Koleros)
Presentation slide


As with pretty much any conference I go to, I came home with a reading list:

And some to dos:

  • re-read the TRC Calls to Action and figure out which things I can take action on! And then take action!
  • try writing “the Morning Pages”
  • listen to the audiorecorded reflections that I made during the conference and document any insights I want to capture
  • read all the books on the above list!

Sessions I Attended

Posted in evaluation, evaluation tools, notes | Tagged , , , , , | Leave a comment

CHSPR Conference Poster References

Some colleagues and I are presenting a poster at the Centre for Health Services & Policy Research conference on March 7-8, 2019. Rather than cluttering up our poster with a reference list, we are putting our references online here and our poster will have a QR code linked to this page. So if you’ve come looking for the references from our poster, you’ve come to the right place!

  1. American Society for Quality. (2017). What is audit? Retrieved from American Society for Quality: http://asq.org/learn-about-quality/auditing/
  2. Baily, M. A., Bottrell, M., Lynn, J., & Jennings, B. (2006). The Ethics of Using QI Methods to Improve Health Care Quality and Safety. RAND Corporation. Retrieved from http://www.thehastingscenter.org/wp-content/uploads/The-Ethics-of-Using-QI-Methods.pdf
  3. Benjamin, A. (2008). Audit: how to do it in practice. BMJ: British Medical Journal, 336(7655), 1241.
  4. Canadian Evaluation Society. (2015, October). What is Evaluation. Retrieved March 3, 2017, from Canadian Evaluation Society: http://evaluationcanada.ca/what-is-evaluation
  5. Cook, P. F., & Lowe, N. K. (2012). Differentiating the Scientific Endeavors of Research, Program Evaluation, and Quality Improvement Studies. Journal of obstetric, gynecologic, and neonatal nursing, 41(1), 1-3.
  6. Council for International Development. (2014, June). Monitoring Versus Evaluation. Retrieved March 3, 2017, from Council for International Development: http://www.cid.org.nz/assets/Key-issues/Good-Development-Practice/Factsheet-17-Monitoring-versus-evaluation.pdf
  7. Hedges, C. (2009). Pulling It All Together: QI, EBP, and Research. Nursing management. Nursing management, 40(4), 10-12.
  8. Hill, S. L., & Small, N. (2006). Differentiating Between Research, Audit and Quality Improvement: Governance Implications. Clinical Governance: An International Journal, 11(2), 98-10
  9. Naidoo, N. (2011). What is Research? A Conceptual Understanding. African Journal of Emergency Medicine, 1(1), 47-48.
  10. Newhouse, R. P., Pettit, J. C., Poe, S., & Rocco, L. (2006). The Slippery Slope: Differentiating between Quality Improvement and Research. Journal of Nursing Administration, 36(4), 211-219.
  11. Shirey, M. R., Hauck, S. L., Embree, J. L., Kinner, T. J., Schaar, G. L., Phillips, L. A., . . . McCool, I. A. (2011). Showcasing Differences Between Quality Improvement, Evidence-Based Practice, and Research. The Journal of Continuing Education in Nursing, 42(2), 57-68
  12. U.S. Department of Health & Human Services. (2009, 01 15). Basic HHS Policy for Protection of Human Research Subjects. Retrieved March 3, 2017, from Office for Human Research Protections – U.S. Department of Health & Human Services: https://www.hhs.gov/ohrp/regulations-and-policy/regulations/45-cfr-46/#46.102
  13. United States Government Accountability Office. (2011, May). Performance Measurement and Evaluation: Definitions and Relationships. Retrieved March 3, 2017, from Program Performance Assessment: http://www.gao.gov/assets/80/77277.pdf
Posted in Uncategorized | Tagged , , | Leave a comment

Recap of the 2018 Canadian Evaluation Society conference

This year’s Canadian Evaluation Society (CES) conference was held in Calgary, Alberta and had a theme of Co-Creation. As always, I had a great time connecting with old friends and making new ones, learning a lot, and getting to share some of my own learnings too.

As I usually do at conferences, I took a tonne of notes, but for this blog posting I’m going to summarize some of my insights, by topic (in alphabetical order) rather than by session as I went to some different sessions that covered similar things. Where possible, I’ve included the names of people who said the brilliant things that I took note of, because I think it is important to give credit where credit is due, but I apologize in advance if my paraphrasing of what people said is not as elegant as the way that people actually said them. Anything in [square brackets] is my thoughts that I’ve added upon reflection on what the presenter was talking about.

Complex Systems

  • I didn’t see as many things about complexity as I usually do at evaluation conferences.
  • No one person has their mind around a complex system. You need all the people in the room to understand it. Systems are messy because people are messy. (Patrick Field)


  • How do we get past the view that humans are supreme beings and the environment is just there to serve us (vs. a stewardship view)? This view is deeply embedded in our identities. Even the legal system is set up to prioritize making money (and views environmentalism as a “nuisance”) (Jane Davidson)
  • People have a fear of being evaluated on things they don’t (feel they) have control over “don’t evaluate me on sustainability stuff! It’s affected by so much else!” Looking at outcomes that are outside the control of the program isn’t meant to be about evaluating a program/organization on their performance, but about identifying the things that are constraining them from achieving the outcomes they are trying to achieve. It’s not though if you only look at the things within the box of your program, you can really control all of the things  in the box and they aren’t affected by the things outside your program. [No program is a closed system]. (Jane Davidson)
  • We focus on doing evaluations to meet the client’s requests, and maybe we stretch it to cover some other things. Sometimes you can slip in stuff that the client didn’t ask for but then you can use that to demonstrate the value of it. People often limited by what they ask for to what they think is possible and sometimes you need to be able to demonstrate the possibilities first (Jane Davidson)
  • It’s not just about asking “how good were the outcomes”, but “how good was this organization in making the trade offs?”(Jane Davidson)

Evaluation Approaches and Methods

  • People limited their questions to what they think can be measured (e.g., I want to see indicator X move by Y%). When clients say “we can’t measure that!”, Jane tells them “Look there are academics who spent their love life studying “love”. If they can do that, we can find a way to measure what you are really interested in. And it doesn’t have to be quantitative!” The client isn’t a measurement expert and they shouldn’t be limiting their questions to what they think can be measured. (Jane Davidson)
  • Once upon a time, evaluation was about “did you achieve your objectives?” but now we also think about the side effects too! (Jane Davidson).


  • Bower & Elnitsky talked about having to distinguish between evaluation/quality imrpovement/data collection/clinical indicators/performance indicators (and how, in their view, these aren’t different things) and to talk to their client about how evaluation adds value. This struck a chord with me as it was similar to some of the things that my co-authors and I talk about in a paper we currently have under review in the Canadian Journal of Program Evaluation.
  • Sarah Sangster felt that evaluation is like research, but more. She described how evaluation requires all the things you need to do research, but also has some things that research doesn’t (e.g., some evaluation-specific methods). She talked about how ways that people sometimes try to differentiate evaluation and research are really shared (e.g., evaluation is often defined as referring to judging “merit/value/worth”, but that research does that too (e.g., research judges the “best treatment”). [Some of the things she talked about were things that my co-authors and I grappled with in our paper – such as how research is a lot more varied than people typically give it credit for (e.g., participatory action research or community-based research stretch the boundaries are traditional research in that the questions being explore come from community instead of from the researchers and the results are specifically intended to be applied in the community rather than just being knowledge for knowledge’s sake).

Evaluation Competencies

  • The CES is updating its list of evaluation competencies – those things a person should know and be able to do in order to be a competent evaluator. The evaluation competencies are used by the society to assess applicants for the Credentialed Evaluator designation – people have to demonstrate that they’ve met the competencies. The competencies are being revised and updated and the committee is taking comments on the draft until June 30, 2018. They expect to finalize the new competencies in Sept 2018.

Evaluation Ethics

  • The CES is also looking at renewing its ethics statement, which hasn’t been updated in 20 years! I went to a session where we looked at the existing statement and it clearly needs a lot of work. The society is currently doing an environmental scan (e.g., looking at other evaluation societies’s ethics guidelines/principles/codes/etc.) and consultations with stakeholders (e.g., the session I attended at the conference) and plan to have a decision by the fall if they are going to just tweak the existing statement or completely overhaul it. They hope to have a finished product to unveil at next year’s CES conference.
  • During the session that Alec from my team led, which was a lighting round table where people circled through various table discussions, one of the things we talked about while discussing doing observations was ethics. For example, when we are doing observations, it is understood that if you are in a public place, you might be observed and it is ethics to observe people. The question arose “are hospitals public places?”

Evaluation, Use Of

  • There was a fascinating panel of 3 mayors who were invited to the conference to talk about what value evaluation can add for municipalities. None of the mayors had even heard of the Canadian Evaluation Society prior to being invited to the conference, so we definitely have our work cut out for us in terms of advocating for evaluation at the municipal level. There is definitely lots of evaluation work that can be done at the municipal level and it would be worthwhile for the society to educate municipal politicians about what we do and how it can help them. The mayors were open to the idea of using evaluation findings in their decision making. There was a suggestion that there should be a panel of evaluators at the Canadian municipalities conference, just like we had the mayors’ panel at our evaluator conference, and I seriously hope the CES pursues this idea.

Evaluation as intervention

  • Evaluators affect the things they evaluate. The act of observing is well known to affect the behaviour of those being observed. As well, we know that “what gets measured gets managed,” so setting up specific indicators that will be measured will cause people to do things that they might not otherwise have done. This is an important thing that we should be discussing in our evaluation work.

Indigenous Evaluation

There was a lot of discussion around indigenous populations and indigenous evaluation, in keep with the CES’s commitment “to incorporating reconciliation in its values, principles, and practices.”

  • The opening keynote on Reconciliation and Culturally Responsive Evaluation was introduced with “You will feel uncomfortable and that is by design. Ask yourself why it makes you uncomfortable.” [The history – and present – of indigenous people is a hard thing to grapple with for many reasons. There are so many injustices that have been done – and continue to be done – and each of us participates in a system that perpetuates that injustice. Doing nothing about it is to do harm. And for those of us who are not indigenous, there can be a mixture of ignorance about our own history and ignorance about our actions and inactions that contribute to the injustice, privilege, and lack of knowing what we can do that can all contribute to this discomfort.]
  • Several of the people speaking about indigenous evaluation talked about the need for indigenous-led evaluation. We have a long history of evaluation and research in indigenous communities being led by non-indigenous people where they take from the community, don’t contribute to the community, don’t ask the questions that the community needs answered, don’t understand things from the communities’ perspectives, impose Western view/perspectives/model, and then leave the community no better off than before.
  • “Scientific colonialism” colonial powers export raw data from communities to “process it”. (Nicole Bowman)
  • Despite all the evaluation that has been going on for years, we still face all the same problems – maybe worse. (Kate McKegg)
  • “How do we ensure evaluation is socially justi, as well as true, that it attends to the interests of everyone in society and not solely the privileged” (House, 1991 – cited by Kate McKegg).
  • “Culturally-responsive evaluation seems to be about giving “permission” to colonizers and settlers to do evaluation in indigenous communities” (Kate McKegg).
  • “Sometimes the stories we are telling are not the stories that need to be told.” (Larry Bremner) [Larry was talking about the ways in which evaluation can further perpetuate injustice against, and further ignore and marginalize, indigenous people, thorough what we do and do not study.] Is our own working maintaining colonial oppression?
  • “Trauma is never far from the surface in indigenous communities.” Larry Bremner.
  • Since many aspects of culture and ceremony have been destroyed by colonialism, how are people supposed to heal, as culture and ceremony are ways of – healing? (Nicole Bowman)
  • The lifespan of indigenous people is 15 years less than non indigenous people.
  • There is a lot of diversity among indigenous people in Canada: 617 First Nations, as well as Inuit and Métis; 60 languages.
  • Larry Bremner quoted a few people that he’d worked with: “Everyone is talking about reconciliation, but what happened to the “truth” part?” [in reference to the Truth and Reconciliation commission] and “In my community, reconciliation is about making white people feel less guilty.” [Even work that is supposed to be about dealing with injustice against indigenous people gets turned around to serve white people instead.]
  • Understanding our history is needed to understand the legal and policy work in which we live today. Evaluators need to understand authority power. (Nicole Bowman)
  • How can non-indigenous people be good allies?
    • We have to be clear on our own identities as settlers and colonizers, recognize our privilege. Our identity is shaped by our history and our present. Colonization is still going on and is nonconsensual and designed to benefit the privileged. (Kate McKegg)
    • We don’t even know our own history, let alone that of indigenous people.(Kate McKegg)
    • It’s not indigenous people’s responsibility to teach us about this – it’s own our job. Only when we understand ourselves can we hear indigenous people. (Kate McKegg)
    • Do your homework. Expand your indigenous networks. Undertake relevant professional development. Build relationships. (Nan Wehipeihana)
    • Advocate for indigenous-led evaluation – indigenous people evaluating as indigenous people:

Slide by Nan Wehipeihana

  • During the opening keynote, an audience member asked how non indigenous people can learn if it’s not indigenous peoples’ responsibility to teach non indigenous people.
    • The panelists noted that indigenous people are a small group who first priority is to do work to help their communities – expecting them to educate you is to put a burden on them that is not their responsibility.
    • Kate McKegg noted that indigenous people have been trying to talk to non indigenous people for years and we haven’t listened to them. She suggested that we can work with other settlers who want to learn – there is lots available to read, to start.
    • Nicole Bowman noted that observation is how we traditionally learned and it is part of science to observe – do some observing.
    • Larry pointed out that indigenous people have taught their ways to others before and people have taken their protocols and not used them well – why should they give non indigenous people more tools to hurt indigenous people?
  • Lea Bill from the First Nations Information Governance Centre spoke about the OCAP® principles, which refers to Ownership, Control, Access, and Possession of data, in that First Nations have rights to all of these. [I have learned about OCAP® before, but hadn’t realized until I saw this presentation that it was a registered trademark).
    • All privacy legislation is about protecting individual privacy rights, but OCAP® is about collective, community rights.

Slide by Lea Bill

  • When you work with indigenous communities, you need to know who the knowledge holders in the community are – they have rights and privileges and if you don’t know, you could offend people and not get good information. (Lea Bill)
  • Indigenous indicator are bicultural – all things are interconnected and human beings are not separate from the environment. (Lea Bill)


  • Something I’ve been interested lately is how people from different disciplines use words differently. Two disciplines might use the same word to mean different things, or they might use different words to mean the same thing. One of the sessions I attended was about a glossary that thad been created to clarify words/phrases that are using by financial/accounting people vs. evaluation people. Check out the glossary here.


  • I attended a thematic breakfast session that was a live taping of an episode of the Eval Cafe podcast. It was a chance for a group of us to reflect on what we’d learned about at the conference. You can check out the podcast here.

Caroline & Brian doing soundcheck for a live podcast at the Canadian Evaluation Society 2018 conference


  • We need to think bigger than binary thinking – “what in our control vs. not in our control?”, “Yes/No”, “Good/Bad”, “Pre/Post”. [Few things are really black or white – often things that are we think of as binary are really more of a gradient or spectrum. There are fuzzy boundaries between things. It’s one of the reasons I like to start questions with “to what extent….” Like “to what extent did the program achieve its goals?”]

To Dos:

  • Watch “The Doctrine of Discovery – Unmasking the Domination Code”
  • Read “Pagans in the Promised Land” by Steven J. Naucomb
  • Research  “are hospitals public places?” for the purposes of observations.

Sessions I Attended:


  • Opening Keynote Panel: Reconciliation and Culturally-Responsive Evaluation: Rhetoric or Reality? with panelists Dr. Nicole Bowman, Larry K. Bremner, Kate McKegg, and Nan Wehipeihana
  • Keynote with panelists Dr. Jane Davidson, Patrick Field, Sean Curry, Dr. Juha I. Uitto
  • Keynote by Lea Bill
  • Mayors’ Panel: A municipal perspective on co-cration and evaluation with panelists Heather Colberg, Mayor of Drumheller, Alberta; Mark Heyck, Mayor of Yellowknife, NWT; Stuart Houston, Mayor of Spruce Grove, Alberta
  • Fellows Panel – Evaluation for the Anthropocene.
  • Closing Keynote Panel: Reflection on Co-Creation Conference 2018 by CES Fellows – Our rapporteurs, realists, and renegades.

Concurrent Sessions:

  • Collaborating to improve wait times for a primary care geriatric assessment and support program by Emily Johnston, Krista Rondeau, Kathleen Douglas-England, Bethan Kingsley, Roma Thomson
  • Surveying an Under-Represented Population: What We Learned by Surveying Great- Grandma by Kate Woodman, Krista Brower
  • Co-Creating Evaluation Capacity in Primary Care Networks: A Case Example of Lessons Learned by Krista Brower, Sherry Elnitsky, Meghan Black
  • Evaluators faced with complexity: presentation of the results of a synthesis of the literature by Marie-Hélène L’Heureux
  • Knowledge translation and impacts ” unpacking the black box by Ambrosio Catalla Jr, Ryan Catte
  • Evaluation and research: Two sides of the same coin or different kettles of fish? by Sarah Sangster, D. Karen Lawson
  • Who’s keeping score? A team-based approach to building a performance measurement scorecard by Beth Garner
  • Updating the CES Competencies for Evaluators: A Work in Progress by Gail Vallance Barrington, Christine Frank, Karyn Hicks, Marthe Hurteau, Birgitta Larsson, Linda Lee
  • Help Us Co-create CES’s Renewal Vision of Ethics in Program Evaluation! by CES Ethics Working Group on Ethics, Environmental Scan and Stakeholder Consultation Subcommittees
  • On the Road with the EvalCafe Podcast: Greetings from Calgary! by Carolyn Camman, Brian Hoessler
  • Integrating Social Impact Measurement Practice into Social Enterprises: A Sociotechnical Perspective by Victoria Carlan
  • From Collaboration to Collective Impact; Measuring Large-scale Social Change by Andrea Silverstone, Debb Hurlock, Tara Tharayil
  • The Rosetta Stone of Impact: A Glossary for Investors and Evaluators by David Pritchard, Michael Harnar, Sara Olsen

Presentations I Gave:

  • An inside job: Reflections on the practice of embedded evaluation by Amy Salmon, Mary Elizabeth Snow
  • How is evaluation indicator development like an orchestra? by Mary Elizabeth Snow, Alec Balasescu, Joyce Cheng, Allison Chiu, Abdul Kadernani, Stephanie Parent
  • Can co-creation lead to better evaluation? Towards a strategy for co-creation of qualitative data collection tools by Alec Balasescu, Joyce Cheng, Allison Chiu, Abdul Kadernani, Stephanie Parent, Mary Elizabeth Snow
Posted in evaluation, event notes | Tagged , , , , , , | Leave a comment

Recap of the Canadian Evaluation Society’s 2017 national conference

The Canadian Evaluation Society’s national conference was held right here in Vancouver last month! I was one of the program co-chairs for the conference and I have to say that it was pretty awesome to see a year and a half worth’s of work by the organizing committee come to fruition! There were a lot of people involved in putting together the conference and so many more parts to it than I had realized when I started working on it and it was incredible to see everything work so smoothly!

As I usually do at conferences, I took a tonne of notes, but for this blog posting I’m going to summarize some of my insights, by topic (in alphabetical order) rather than by session 1Though I’ve listed all the sessions I attended at the bottom of this posting. as I went to some different sessions that covered similar things. Where possible, I’ve included the names of people who said the brilliant things that I took note of, because I think it is important to give credit where credit is due, but I apologize in advance if my paraphrasing of what people said is not as elegant as the way that people actually said them.


  • Damien Contandriopoulos noted that context is often defined by what it is not – it is not your intervention – i.e., it’s whatever is outside your intervention, but it’s not the entire universe outside of your intervention. Just what is close enough to be relevant/important to the analysis. He also noted that some disciplines don’t talk about context at all (e.g., they might talk about the culture in which an intervention occurs, but don’t talk about it as separate from the intervention the way we talk about context as being separate from the intervention).
  • Depending on your conceptualization of “context”, you may want to:
    • neutralize the context (e.g., those who think that context “gets in the way” and thus they try to measure it and neutralize it so it won’t “interfere” with your results). Contandriopoulos clearly didn’t favour this approach, but noted that it could work if your evaluand was very concrete/clear.
    • adapt to context
    • describe the context
  • In all of the above options, it’s about generalizability/external validity (e.g., if you are trying to neutralize the context, you are wanting to know if the evaluand works and don’t want the context to interfere with your conclusion about if the the evaluand works; if you are adapting to the context, you want to figure out how the evaluand might work in a given context; if you are describing the context, you are wanting to understand the context to use to interpret your evaluation findings)
  • From the audience, AEA president Kathryn Newcomer, mentioned a paper by Nancy Cartwright about transferability of findings 2She didn’t say the name of the paper or the journal, but based on her comments about the paper, I believe it is likely this paper. Unfortunately, it’s behind a paywall, so I can’t read more than the abstract., specifically about how Cartwright talks about “support factors” rather than context. Further, she talked about how in the US there is lots of interesting in “scaling up” interventions, but rarely do studies document the support factors that allow an intervention to work (e.g., you need to have a pool of highly qualified teachers in the area for program X to work). She suggested:
    • putting the support factors into the theory of change
    • considering: how do we know if the support factors are necessary or sufficient? What if you need a combination of factors that need to be present at the same time and in certain amounts for the program to work? etc.
  • Contandriopoulos mentioned that sometimes people just list “facilitators” and “barriers” as if that’s enough [but I liked Newcomer’s suggestion that “support factors” (or barriers, though she didn’t mention it) could be integrated into the theory of change]


  • Kas Aruskevich showed an imagine of a river in Alaska viewed from above and noted that if you were standing by the side of that river, you’d never know what the sources of that river are (as they are blocked by mountains) and she likened evaluation to taking that perspective from a distance where you look at the whole picture. I liked this analogy.
  • Kathy Robrigado talked about how the accountability function of evaluation is often seen as an antagonist to learning, but she sees it as a jumping off point for learning.
  • In summarizing the Leading Edge panel, E. Jane Davidson had a few things to say that were very insightful in relation to thinking I’ve been doing lately with my team about what evaluation is (and how it compares/relates to other disciplines that aim to assess program/projects/etc.). With respect to monitoring, she noted that people often expect key performance indicators (KPIs) to be an answer, but they aren’t. Often what’s the easiest to measure is not what’s most important. In evaluation, we need to think about what’s most important (not just what’s strong or weak, but what really matters).

Evaluation, History of the Field

  • Every time I go to a evaluation conference, someone gives a bit of a history of the field of evaluation from their perspective (perhaps once day I’ll compile them all into a timeline). This conference was no different, with closing keynote speaker Kylie Hutchison talking about what she has seen as “innovations” in evaluation that had a lot of buzz around them and then eventually settled into an appropriate place [her description made me think of the “hype cycle“, which someone had coincidentally shown in one of the sessions that I was in]:
    • 1990s – logic models
    • 2000s – the big RCT debate (i.e., are RCTs really the “best” way to evaluate in all class)
    • social return on investment (SROI), Appreciative Inquiry
    • developmental evaluation, systems approaches
    • deliverology

Evaluators, Role of

  • Lyn Shulha noted that as an evaluator, you’ll never have the same context/working conditions from one evaluation to the next, and you’ll never have a “final” practice or theory – they will continue to change.
  • Kathy Robrigado talked about starting an evaluation as an “evaluator as critical friend” (e.g., asking provocative questions to understand the program/context, offering critiques of a person’s work, providing data to be examined through another lens). But after awhile, they found this approach to be too resource intensive, as they had ~60 programs to deal with and data collection was cumbersome; they moved from critical friend to “strategic acquaintance” (or, as she put it, “we had to friendzone the programs”)
  • Michel Laurendeau stated that “evaluators are the experts in interpreting monitoring data” as what you see when you look at the data isn’t necessarily what is really going on [this reminded me of something that was discussed at last year’s CES conference: what the data says vs. what the data means]
  • Kylie Hutchison talked about how many evaluators are talking about the evaluator as a social change agent. People gravitate to this profession because they want to be involved in social change – maybe they are a data geek, but they see how the data can lead to social change. She also talked about how many skills she has needed to build to support her evaluation practice: in grad school she focused on methods and statistics, but when she went on to become a consultant she didn’t find that she needed advanced statistics – she needed skills in facilitation, then data visualization, and now organizational development.

Knowledge Translation

  • Kim van der Woerd described getting knowledge into action as “the long journey from the head to the heart”. I really like this phrase, as just knowing something (with the head) doesn’t necessarily mean we take it to heart and put it into action. I wonder how thinking about how we can get things from the head to the heart could help us think about better ways to promote the translation of knowledge into action.


  • Lyn Shulha talked about learning spirals – as we travel from novice to expert, we can imagine ourselves descending down, say, a spiral staircase. As a given point, we can be at the same place as earlier, but deeper (as well, we are changed from when we were last at this point). She noted that we “need to hold onto our experiences and our truths lightly”, lest we end up traveling linearly rather than in a spiral.

Logic Models

  • One of the sessions I was in generated an interesting discussion about different ways that people use logic models, such as:
    • having the lead agency of a program create a logic model of how they think the program works and then having all the agencies operating the program create logic models of how they think the program works and then compare – if they have different views of how the program works, this can generate important discussions
    • calling the first version of the logic model “strawman #1” to emphasize that the logic model is meant to be challenging and changed.


  • Report structure recommended by Julian King in the Leading Edge panel on Rubrics:
    • answer the evaluation question
    • key evidence & reasoning behind how you came up with the answer
    • extra information
      • They summarized this as spoiler, evidence, discussion, repeatUntitled


  • E. Jane Davidson noted that in social sciences, people are often taught how to break things down, but not how to pack it back together again to answer the big picture question. For example, you’ll often see people report the quantitative results, then the qualitative results, but with no actual mixing of the data (so it’s not really “mixed methods” – it’s more just “both methods”).
  • Also from E. Jane Davidson – the length of a section of a report is typically proportional to how long it took you to do the work (which is why literature reviews are so long), but that’s not what’s most useful to the reader. It’s like we feel we have to put the reader through the same pain we went through to do the work; we want them to know we did so much work! And then they get to the end and we say “the results may or may not be…. and more research is needed.” Not helpful! Spoilers really are key in evaluation reporting – write it like a headline. Pique their interest in the spoiler and then they want to read the evidence (how did they decide that??
    • 7 +/- 2 key evaluation questions (KEQ):
      • executive summary: KEQ 1, answer + brief evidence; KEQ 2, answer + brief evidence; KEQ 3, answer + brief evidence
      • and make sure your recommendations are actionable!


  • The Leading Edge Panel on Rubrics was easily my favourite session of the conference. I’ve done a bit of reading about rubrics after going to a session on them at the Australasian Evaluation Society conference in Perth, but found that this panel really brought the ideas to life for me.
  • Kate McKegg mentioned that she asked a group of people in healthcare if they thought that their organizations key performance indicators (KPIs) reflected the value of what their organization does, and not a single person raised their hand [This resonated with me, as my team and I have been doing a lot of work lately on differentiating, among other thing monitoring and evaluation.]
  • Rubrics:
    • can help clarify what matters and include those things in your evaluation
    • are made of:
      • evaluative criteria – to come up with these, can check out the literature, talk to experts, talk to stakeholders (e.g., people on the front lines); can also think about what would be appropriate for the cultural context (e.g., what would make a program excellent in light of the cultural context?)
      • levels of importance (of the criteria) – remember, things that are easy to measure are not necessarily what’s important
      • rating scale (how to determine the level of performance (e.g., excellent-very good-good-adequate-emerging-not yet emerging-poor); depending on your context, you may choose different words (e.g., may use “thriving” instead of “excellent”)
    • can be:
      • analytic – describe the various performance levels for each criterion
      • holistic – a broad level of description of performance at each level (e.g., describe “excellent” overall (encompassing all the criteria) rather than describing “excellent” for each criterion individually)
    • analytic can provide more clarity, but require more data
  • You should be able to see your theory of change in the rubric. Key evaluation questions (KEQ) often follow the theory of change (e.g., KEQs might be “how well are we implementing?” or “how well are we achieving outcome #1?” Think about the causal links in the theory of change. If there is a deal breaker, it should show up in the theory of change.). Think about the causal links and their strength.
  • You can embed cultural values into the process (e.g., for the Maori, the word “rubric” didn’t resonate, so Nan Wehipeihana used a cultural metaphor that did; rather than words like “poor” and “excellent”, can use words that fit better like a “seed with latent potential” and “blooming” and “coming to fruition”)
  • Values are the basis for criteria – they reflect what is valued (and whose values hold sway matters)
  • Once you have a rubric, you need to collect data to “grade” the program using the rubric; data may come from all sorts of places (e.g., previous research, administrative data, photos from the program, interviews/surveys/focus groups)
  • Can make a table of each criteria and data source and use that to optimize your data collection:
Admin Data Interview Staff Interview Participants Photos from the Program
Criterion 1  x  x
 Criterion 2  x x x
 Criterion 3  x  x
 Criterion 4 x  x
 Criterion 5  x x x
  • Then you can look at all the things you want to collect from each data source (e.g., you can ask about criteria 2, 4, and 5 in interviews with staff; look for criteria 1, 2, and 3 in the photos from the program) = integrated data collection
  • Make sure that the data collection is designed to answer the evaluation questions.
  • Look to see if you are getting consistent information (i.e., saturation) or if the data is patchy or inconsistent and you need to get more clarity.
  • Bring data to stakeholders as you go along (especially for long evaluations – they don’t want to wait until the end of 3 years to find out how things are going!)
  • 3 steps to making sense of data:
    • analysis – breaking something down into its component parts and examining each part separately (King et al, 2013)
    • synthesis – putting together “a complex whole made up of a number of parts or elements ” (OED online); assembling the different sources of data. Sometimes when you are working on data synthesis, you learn that what’s important isn’t what you initially thought was important (so you need to rejig your rubric). Also think about what the deal breakers are (e.g., if no one shows up to the program…)
    • sensemaking: helps to clarify things; one way to do this is to get all the stakeholders together, give them the synthesized data (a rough cut), and go through a process like this:
      • generalization: In general, I noticed…
      • exception: In general…, except….
      • contradiction: On one hand…, but ont he other hand…
      • surprise: I was surprised by…
      • puzzle: I wonder…
    • When you think about the exceptions or contradictions – how big of a deal are they? Are they deal breakers?
    • As stakeholders do this, they start to understand the data and to own the evaluation. Often they make harder judgments than the evaluator might have.
    • Typically, they do the synthesis and bring that to the stakeholders to do sensemaking; but don’t spend a lot of time making the synthesized data looked polished/finished – it should look rough as it is to be worked with. Not everyone will spend time reading the data synthesis in advance, so give them time to do that at the start of the session.
    • Put up the rubric and have the stakeholders grade the program.
    • Often people try to do analysis, synthesis, and sensemaking all at the same time, but you should do them separately.
  • Rubrics “aren’t just a method – they change the whole fabric of your evaluation”. They can help you “mix” methods (rather than just doing “both”) methods – they can help you make sense of the “constellation of evidence”).
  • I asked how do they deal with situations that are dynamic? Their answer was the rubrics can evolve, especially with an innovative program. You create it based on what you imagine the outcome will be, but other things can emerge from the program. You can start with a high level rubric (don’t want to get too detailed or overspecified that you paint yourself into a corner). You need it to be underspecified enough to be able to contextualize it to the setting. It’s like the concept of “implementation fidelity” – implementing something exactly as specific is not the best – you should be implementing enough of the intent in a way that will work in the setting.
  • Another audience member asked how would you determine if a rubric is valid/reliable? The speakers noted that often people ask “is it a valid tool?” meaning “was it compared to a gold standard /previously validated tool”? But those other tools are often too narrow/miss the mark. The speakers suggested that “construct validity is the mother of all validities” – the most important question is “is it useful for the people for whom it was built?”
  • Another audience member asked about “scaling up” rubrics. The speakers noted examples where they had worked on projects to create rubrics to be used across a broader group than those who created it – e.g., created by the Ministry of Education to be used by many different schools with the help of a facilitator. For these, you need to have a lot more detail/instructions on how to use it (and a good facilitator) since users won’t have the shared understanding that comes from having created it. They have also done “skinny rubrics” to be used by lots of different types of schools (so had to be underspecified), but again, need to provide lots of support to users.

Systems Thinking

  • Systems archetypes are common patterns that emerge in systems. This was a concept that was brought up by an audience member in my session on complexity, and is something I want to read more about!
  • Heather Codd talked about three key concepts in using systems thinking (using Donella Meadow’s definition of a system as something with parts, links between parts, and a boundary) in evaluation:
    • interrelationships – understanding the interrelationships and what drives them helps us to understand what’s going on with the program (and she suggested using rich pictures to help focus the evaluation and think about what the consequences of the program might be)
    • boundaries – we need to pick a boundary for the purpose of analysis, but note that it is sensitive because it defines what is in and out of the evaluation. She suggested using critical system heuristics to help describe the program, scope the evaluation, and decide on an evaluation approach)”
      Critical systems heuristic slide
    • multiple perspectives – what are the world views being applied and what the implications of those world views? She suggested you can do a stakeholder analysis, but also a stake analysis; she also suggested “framing” by using an idea from Bob Williams, where you add the words “something to do with…” in front of ideas (e.g., “Something to do with a culture of health”, “something to do with managing heart disease”; this tool can help give you a sense of the intervention’s purpose and the evaluation’s purpose.
    • Evaluators are an element in a system and we cannot separate out our effect on the systems [This made me think of “co-evolution” – the evaluation co-evolves along with the rest of the system]
    • There are echoes in a system of what has happened before [e.g., intergenerational trauma]

Truth & Reconciliation

  • Last year, the CES took a position on reconciliation in Canada. Several of the speakers at the conference talked about this topic. For example, Kim van der Woerd talked about a witness as being one who listens with their whole heart and validates a message by sharing it (and that they have a responsibility to share it). She also noted that the Truth and Reconciliation Commission (TRC) wasn’t Canada’s first attempt at trying to build a good relationship between Aboriginal and non-Aboriginal people – the Royal Commission on Aboriginal People put out a report with recommendations in 1996. But when it was evaluated in 2006, Canada received a failing grade with 76% of the 400+ recommendations being not done and with no significant process. She noted that we shouldn’t wait 10 years before we evaluate how well Canada is doing on the TRC recommendations.
  • Paul Lacerte outlined a set of recommendations:
    • amplify the new narrative (where the old narrative was “the federal government takes care of the natives”)
    • conduct research & develop a reconciliation framework
    • set targets for recruiting and training indigenous evaluators
    • learn about and follow protocol (e.g., how to start a meeting, gift giving)
    • put up a sign in your workspace about the traditional territory on which you are working
    • volunteer for an indigenous non-profit
    • join the Moose Hide Campaign
  • At the start of her closing keynote, Kylie Hutchison acknowledge that she was speaking on the unceded traditional territory of the Musqueam, Squamish, and Tsleil-Waututh First Nations. And then she said that she’d never said that before speaking before but that she would be now. And I thought that it was a really cool think to witness someone learning something new and putting it into practice like that, especially something so meaningful.


  • The best joke I heard in a presentation was when Kathy Robrigado, after a few acronym-filled sentences in her presentation, said, “As you know, government employees are paid by the number of acronyms they use”

To Dos:

Sessions I Attended:

  • Opening Keynote by Kim van der Woerd and Paul Lacerte
  • Short presentation: Causing Chaos: Complexity, theory of change, and developmental evaluation in an innovation institute by Darly Dash, Hilary Dunn, Susan Brown, Tanya Darisi, Celia Laur Cypress
  • Short presentation: Implications of complexity thinking on planning an evaluation of a system transformation by M. Elizabeth Snow, Joyce Cheng [This was one of my own presentations!]
  • Short presentation: Cycles of Learning: Considering the Process and Product of the Canadian Journal of Program Evaluation Special Issue by Michelle Searle, Cheryl Poth, Jennifer Greene, Lyn Shulha
  • Short presentation: Using System Mapping as an Evaluation Tool for Sustainability by Kas Aruskevich
  • Incorporating influence beyond academia data into performance measurement and evaluation projects by Christopher Manuel
  • Exploring Innovative Methods for Monitoring Access to Justice Indicators by Yvon Dandurand, Jessica Jahn
  • A Quasi-Experimental, Longitudinal Study of the Effects of Primary School Readiness Interventions by Andres Gouldsborough
  • What Would Happen If…? A Reflection on Methodological Choices for a Gendered Program by Jane Whynot, Amanda McIntyre, Janice Remai
  • Towards Strategic Accountability: From Programs to Systems by Kathy Robrigado
  • Getting comfortable with complexity: a network analysis approach to program logic and evaluation design by John Burrett
  • Communication in System Level Initiatives: A grounded theory study by Dorothy Pinto
  • Seeing the Bigger Picture: How to Integrate Systems Thinking Approaches into Evaluation Practice by Heather Codd
  • Understanding and Measuring Context: What? Why? and How? by Damien Contandriopoulos
  • A Graphic Designer, an Evaluator, and a Computer Scientist Walk into a Bar: Interdisciplinary for Innovation by M. Elizabeth Snow, Nancy Snow, Daniel J. Gillis [This was another one of my presentations and hands down the best presentation title I’ve ever had]
  • Big Bang, or Big Bust? The Role of Theory and Causation in the Big Data Revolution by Sebastian Lemire, Steffen Bohni Nielsen Seymour
  • Using Web Analytics for Program Evaluation – New Tools for Evaluating Government Services in the Digital Age at Economic and Social Development Canada by Lisa Comeau, Alejandro Pachon
  • The Future of Evaluation: Micro-Databases by Michel Laurendeau
  • Dylomo: Case studies from an online tool for developing interactive logic models by M. Elizabeth Snow, Nancy Snow [This was the last of my presentations]
  • Development and use of an App for Collecting Data: The Facility Engagement Initiative by Neale Smith, Graham Shaw, Chris Lovato, Craig Mitton, Jean-Louis Denis
  • Leading Edge Panel: Evaluative Rubrics – Delivering well-reasoned answers to real evaluative questions by Kate McKegg, Nan  Wehipeihana, Judy Oakden, Julian King, E Jane Davidson
  • Closing Keynote by Kylie Hutchinson

Next CES Conference:

  • Host: Alberta & Northwest Territory confernece
  • May 26-29 – Calgary
  • May 31-June 1 – Yellowknife
  • Theme: Co-creation

Footnotes   [ + ]

1. Though I’ve listed all the sessions I attended at the bottom of this posting.
2. She didn’t say the name of the paper or the journal, but based on her comments about the paper, I believe it is likely this paper. Unfortunately, it’s behind a paywall, so I can’t read more than the abstract.
Posted in evaluation, event notes, notes | Tagged , , , , , , , , , | 1 Comment

On Flexibility in Evaluation Design

Been doing some reading as I work on developing an evaluation plan for a complex program that will be implemented at many sites. Here are some notes from a few papers that I’ve read – I think if anything links these three together, it is the notion of the need to be flexible when designing an evaluation – but you also need to think about how you’ll maintain the rigour of your work.

Wandersman et al (2016)’s paper on using an evaluation approach called “Getting to Outcomes (GTO)” discussed the notion that just because an intervention has been shown to be effective in one setting does not necessarily mean it will work in other settings. While I wasn’t interesting in the GTO approach per se, I found their introduction insightful.

Some notes I took from the paper:

  • the rationale for using evidence-based interventions is that since research studies show that a given intervention leads to positive outcomes, then if we take that intervention and implement it in the same way it was implemented in the research studies (i.e., fidelity to the intervention) on a broad scale (i.e., at many sites), then we should see those same positive outcomes on a broad scale
  • however, when this is actually done, evaluations often show that the positive outcomes compared to control sites don’t happen or that positive outcomes happen on average, but there is much variability among the sites such that some sites get the positive outcomes and others don’t (or even that some sites get negative outcomes)
  • from the perspective of each individual site, having positive outcomes on average (but not at their own particular site) is not good enough to say that this intervention “works”
  • when you implement complex programs at multi-sites/multi-levels, you “need to accommodate for the contexts of the sites, organizations, or individuals and the complete hierarchies that exist among these entities […] the complexity […]” includes multiple targets of change and settings” (p. 549-50)
  • recommendations:
    • evaluate interventions at each site in which it is implemented
    • examine the quality of the implementation
    • consider the fit of the intervention to the local context
      • “the important question is whether they are doing what they need to do in their own setting in order to be successful” (p. 547)
      • “the relevant evaluation question to be answered at scale is not “does the [evidence-based intervention] result in outcomes?” but rather “how do we achieve outcomes in each setting?” (p. 547)
    • evaluators should “assist program implementers to adapt and tailor programs to meet local needs and provide ongoing feedback to support program implementation” (p. 548)
  • empowerment evaluation: premise is: “if key stakeholders (including program staff and consumers) have the capacity to use the logic and tools of evaluation for planning more systematically, implementing with quality, self-evaluating, and using the information for continuous quality improvement, then they will be more likely to achieve their desired outcomes”

Balasubramanian et al (2015) discussed what they call “Learning Evaluation”, which they see as a blend of quality improvement and implementation research. To me it sounded similar to Developmental Evaluation (DE). For example, they state that:

  • “Two key aspects of this approach set it apart from other evaluation approaches; its emphasis on facilitating learning from small, rapid cycles of change within organizations and on capturing contextual and explanatory factors related to implementation and their effect on outcomes across organizations”  (p. 2 of 11)
  • “assessment needs to be flexible, grounded, iterative, contextualized, and participatory in order to foster rapid and transportable knowledge. This approach integrates the implementation and evaluation of interventions by establishing feedback loops that allow the intervention to adapt to ongoing contextual changes.” (p. 2 of 11)

That sound a lot like DE to me. And it sounds a lot like how I’m looking to approach the evaluation I’m currently planning.

Principles underlying the “Learning Evaluation” approach (from page 3 of 11):

 Principle Why
 1. Gather data to describe the types of changes made by healthcare organizations, how changes are implemented, and the evolution of the change process. To establish initial conditions for implementing innovations at each site and to describe implementation changes over time.
 2. Collect process and outcome data that are relevant to healthcare organizations and to the research team To engage healthcare organizations in research and in continuous learning and quality improvement.
 3. Assess multi-level contextual factors that affect implementation, process, outcome, and transportability. Contextual factors influence quality improvement: need to evaluate conditions under which innovations may or may not result in anticipated outcomes.
 4. Assist healthcare organizations in applying data to monitor the change process and make further improvements.  To facilitate continuous quality improvement and to stimulate learning within and across organizations.
5. Operationalize common measurement and assessment strategies with the aim of generating transportable results. To conduct internally valid cross-organization mixed methods analyis

A point that was made in this paper that resonated with me was that: “Within the context of a multi-site demonstration project conducted in real-world settings, it was not feasible to randomize sites or to specify target patient samples or measures a priori.” (p. 7 of 11) Instead, they incorporated elements to enhance the study’s rigour:

  • rigour in study design
    • considered each site as a “single group pre-post quasi-experimental study”, which is subject to history 1i.e., how do you know results aren’t do to other events that are occurring concurrently with the intervention? and maturation 2i.e., how do you know the results aren’t just due to naturally occurring changes over time rather than being due to the intervention? threats to internal validity
    • to counteract these threats, they collected qualitative data on implementation events (to allow them to examine if results are related to implementation of the intervention)
    • they also used member checking to validate their findings
  • rigour in analysis
    • rather than analyzing each source of data independently, they integrated findings
    • “triangulating data sources is critical to rigor in mixed methods analysis”
    • qualitative data analysis was conducted first within a given site (e.g., “to identify factors that hindered or facilitated implementation while also paying attention to the role contextual influences played” (p. 7 of 11), then across sites.

A few other points they make:

  • “ongoing learning and adaptation of measurement allows both rigor and relevance” (p. 8 of 11)
  • by “working collaboratively with innovators to develop data collection strategies and routine processes for jointly sharing and reflecting on data to foster continuous learning, improvement, and advocacy for policy changes” the organization can “develop capacity for data collection and monitoring for future efforts” (p. 8 of 11)
  • this approach “may feel to some to be at odds with current standards of rigor, which value fidelity to a priori hypotheses and methods”, but it is “not a ‘canned’ approach to evaluating healthcare innovations, but it involves the flexible application of five general principles” (p. 9 of 11). “This requires [evaluators] to be flexible and nimble in adapting their approach when proposed innovations are modified to fit the local context.” (p. 9 of 11)

Brainard & Hunter conducted a scoping review with the question “Do complexity-informed health interventions work?” What they found was that although “the lens of complexity theory is widely advocated to improve health care delivery,” there’s not much in the literature to support the idea that using a complexity lens to design an intervention makes the intervention more effective.

They used the term “‘complexity science’ as an umbrella term for a number of closely related concepts: complex systems, complexity theory, complex adaptive systems, systemic thinking, systems approach and closely related phrases” (p. 2 of 11). They noted the following characteristics of systems:

  • “Large number of elements, known and unknown.
  • Rich, possibly nested or looping, and certainly overlapping networks, often with poorly understood relationships between elements or networks.
  • Non-linearity, cause and effect are hard to follow; unintended consequences are normal.
  • Emergence and/or self-organization: unplanned patterns or structures that arise from processes within or between elements. Not deliberate, yet tend to be self-perpetuating.
  • A tendency to easily tip towards chaos and cascading sequences of events.
  • Leverage points, where system outcomes can be most influenced, but never controlled.” (p. 2 of 11)

They also had some recommendations for reporting on/evaluating complexity-informed interventions:

  • results should be monitored over the long term (e.g., more than 12 months) as results can take a long time to occur
  • barriers to implementation should be explored/discussed
  • unintended/unanticipated (including negative) changes should be actively looked for
  • support from the institution/senior staff combined with widespread collaborative effort is needed to successfully implement
  • complexity science or related phrases should be in the title of the article


Balasubramanian, B., Cohen, D.J., Davis, M.M., Gunn, R., Dickinson, L.M., Miller, W.L., Crabtree, B.F., & Stange, K.C. Learning Evaluation: blending quality improvement and implementation research methods to study healthcare innovations. Implementation Science. 10: 31. (full text)

Brinard, J., & Hunter, P.R. Do complexity-informed health interventions work? A scoping review. Implementation Science. 11:127. (full text)

Wandersman, A., Alia, K., Cook, B.S., Hsu, L.L., & Ramaswamy, R. (2016). Evidence-Based Interventions Are Necessary but Not Sufficient for Achieving Outcomes in Each Setting in a Complex World: Empowerment Evaluation, Getting To Outcomes, and Demonstrating Accountability.  American Journal of Evaluation. 37(4): 544-561. [abstract]

Footnotes   [ + ]

1. i.e., how do you know results aren’t do to other events that are occurring concurrently with the intervention?
2. i.e., how do you know the results aren’t just due to naturally occurring changes over time rather than being due to the intervention?
Posted in evaluation, healthcare, notes | Tagged , , , , , , , , | Leave a comment

Pragmatic Science

Another posting that was languishing in my drafts folder. Not sure why I didn’t published it when I wrote it, but here it is now!

  • Berwick (2009) wrote an interesting commentary called “Broadening the view of evidence-based medicine” in which he describes how “scholars in the last half of the 20th century forged our modern commitment to evidence in evaluating clinical practices” (p. 315) and though it was seen as unwelcome at the time, they brought the scientific method to bear on the clinical world, and over time, the randomized controlled trail (RCT) because the “Crown Prince of methods […] which stood second to no other method” (p. 315). And while there has been a huge amount of benefit from this, he says “we have overshot the mark. We have transformed the commitment to “evidence-based medicine” of a particular sort into an intellectual hegemony that can cost use dearly if we do not take stock and modify it” (p. 315). He points out that there are many ways of learning things:
  • “Did you learn Spanish by conducting experiments? Did you master your bicycle or your skis using randomized trials? Are you a better parent because you did a laboratory study of parenting? Of course not. And yet, do you doubt what you have learned?” (p. 315)
  • “Much of human learning relies wisely on effective approaches to problem solving, learning, growth, and development that are different from the types of formal science […and …] some of those approaches offer good defences against misinterpretation, bias, and confounding.” (p. 315).

  • He warns that limiting ourselves to only RCTs “excludes too much of the knowledge and practice that can be harvested from experience, itself, reflected upon” (p. 316)
  • “Pragmatic science” involved:
    • “tracking effects over time (rather than summarizing with stats)
    • using local knowledge in measurement
    • integrating detailed process knowledge into the work of interpretation
    • using small sample sizes and short experimental cycles to learn quickly
    • employing powerful multifactorial designs (rather than univariate ones focused on “summative” questions) ” (p. 316)
 explanatory trials  pragmatic trials
  • evaluating efficacy (how well does it work in a tightly controlled setting)
  • clinical trials that test a causal research hypothesis in an ideal setting
  • evaluating effectiveness (how well does it work in “real life”)
  • trials that help users decide between options
  • high internal validity
  • high external validity
Test sample & setting
  • focus on homogeneity
  • focus on heterogeneity
  • explanatory and pragmatic are not a dichotomy as most trials are not purely one or the other – there is a spectrum between them
  • Thorpe et al (2009) created a tool (called PRECIS) to help people designing clinical trials to distinguish where on that pragmatic-explanatory continuum their trial falls; it involves looking at 10 domains (see table below), with scores on these criteria placed on a 11 spoke wheel (to give you a spider diagram type of picture)
Criteria   explanatory trials  pragmatic trials
participant eligibility
  • strict
  • everyone with condition of interest can be enrolled
experimental intervention – flexibility
  • strict adherence to protocol
  • highly flexible; practitioners have leeway on how to apply the intervention
experimental intervention – practitioner expertise
  • narrow group, highly skilled
  • broad group of practitioners in broad range of settings
comparison group – flexibility
  • strict; may use placebo instead of “usual practice”/”best alternative”
  • “use practice”/”best alternative”, practitioner has leeway on how to apply it
comparison group – practitioner expertise
  • standardized
  • broad group of practitioners in broad range of settings
follow-up intensity
  • extensive follow-up & data collection; more than would routinely occur
  • no formal follow-up; use administrative database to collect outcome data
primary trial outcome
  • outcome known to be direct & immediate results of intervention; may require specialized training
  • clinically meaningful to participants; special tests/training not required
Participant compliance with intervention
  • closely monitored
Practitioner compliance with study protocol
  • closely monitored
Analysis of primary outcome
  • intention-to-treat analysis usually used; but usually supplemented with “compliant participants” analysis to answer question of “does this intervention work in the ideal situation?”; analysis focused on narrow mechanistic questions
  • intention-to-treat analysis (includes all patients regardless of compliance)
  • meant to answer the question “does the intervention work in “real world” conditions, “with all the noise inherent therein” (Thorpe et al, 2009)

I also came across this article in Forbes magazine: Why We Need Pragmatic Science, and Why the Alternatives are Dead-Ends. It’s a short read, but it succinctly summarizes an argument I find myself often making: science is a powerful tool for understanding and explaining the world. It’s not the only tool (philosophy and the other humanities, for example, are great tools for different purposes), but it’s certainly the best one for certain purposes and it’s a fantastic one to have in our toolbox!


Berwick, D.M. (2005). Broadening the view of evidence-based medicine. Quality & Safety in Health Care. 14:315-316. (full-text)

Thorpe, K.E., Zwarenstein, M., Oxman, A.D., Treweek, D., Furberg, C.D., Altman, D.G., Thus, S., Bergel, E., Harvey, I Magid, M.J., & Chalkidou, K. (2009). A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers. Canadian Medical Association Journal. 180(10): E47-E57.

Posted in Uncategorized | Leave a comment

Process Use of Evaluation

Just noticed this in my drafts folder – some notes on process use evaluation from some of the papers I’d been reading on the topic. Figured I should actually publish it.

Definition of process use:

  • “the utility to stakeholders of being involved in the planning and implementation of an evaluation” (Forss et al, 2002, p. 30)
  • Patton describes “process use” as “changes resulting from engagement in the evaluation process and learning to think evaluatively. Process use occurs when those involved in the evaluation learn from the evaluation process itself or make program changes based on the evaluation process rather than findings. Process use also includes the effects of evaluation procedures and operation, for example, the premise that “what gets measured gets done”, so establishing measurements and setting targets affects program operations and management focus.” (Patton, 2008, p. 122) or “individual changes in thinking, attitudes, and behavior, and program or organizational changes in procedures and culture that occur among those involved in evaluation as a result of learning that occurs during the evaluation process.” (Patton, 2008, p. 155)
  • 6 types of process use (pp. 158-9):
    • infusing evaluative thinking into organizational culture
    • enhancing shared understanding
    • supporting and reinforcing program intervention – “the primary principle of intervention-oriented evaluation is to build a program delivery model that logically and meaningfully interjects data collection in ways that enhance achievement of program outcomes, while also meeting evaluation information needs” – while traditional research would view measurement that affects the outcome as contamination, if evaluation is part of the intervention, for the purposes of the evaluation of the program “it does not matter […] how much of the measured changed is due to [the data collection] vs actual [program] activities, or both, as long as the instrument items are valid indicators of desired outcomes” (Patton, 2008, p. 166). “A program is an intervention in the sense that it is aimed at changing something. The evaluation becomes part of the programmatic intervention to the extent that the way it is conducted supports and reinforces accomplishing desired program goals” (Patton, 2008, p. 166)
    • instrumentation effects and reactivity
    • increasing engagement, self-determination, and ownership
    • program and organizational development
  • In the very interesting article “Process Use as a Usefulism”, Patton (2007) describes how he thinks of process use as a “sensitizing concept”
  • sensitizing concept (Patton, 2007, p. 102-103):
    • “can provide some initial direction to a study as one inquires into how the concept is given meaning in a particular place or set of circumstances”
    • “Such an approach recognizes that although the specific manifestations of social phenomena vary by time, space, and circumstance, the sensitizing concept is a container for capturing, holding, and examining these manifestations to better understand patterns and implications”
    • raises consciousness about something and alerts us to watch out for it within a specific context. This is what the concept of process use does. It says things are happening to people and changes are taking place in programs and organizations as evaluation takes place, especially when stakeholders are involved in the process. Watch out for those things. Pay attention. Something important may be happening.”

Types of Use of Evaluation

  • symbolic use (a.k.a., strategic use or persuasive use):
    • “evaluation use to convince others of a political position” (Peck & Gorzalski, 2009, p. 141 )
    • “use of knowledge as ammunition in the attainment of power or profit”(Straus et al, 2010)
  • conceptual use:
    • “to change levels of knowledge, understanding, and attitude” (Peck & Gorzalski, 2009, p. 141)
    • process use: “knowledge gained through the course of conducting  program evaluation” (Peck & Gorzalski, 2009, p. 141)
  • instrumental use:
    • “direct use of evaluation’s findings in decision making or problem solving” (Peck & Gorzalski, 2009, p. 141)
    • “to change behaviour or practice” (Straus et al, 2010)
  • Forss et al (2002) cite Verdung (1997)  as identifying 7 ways that evalautions can be used: “instrumentally, conceptually, legitimizing, interactively, tactically, ritually, and as a process” (p. 31)
  • Forss et al identify 5 different types of process use:
    • learning to learn
      • “Patton (1998) wrote that the evaluation field has its own particular culture, building on norms and values that evaluators take for granted,but which may be quite alien to people embedded in the culture of another profession. Patton (1998: 226) suggests that these values include ‘clarity, specificity and focusing, being systematic and making assumptions explicit, operationalising programme concepts, ideas and goals, separating statement of fact from interpretations and judgments’.” (Forss et al, 2002, p. 33, emphasis mine)
        • I checked out the original source on this – the direct quotation is: “that evaluation constitutes a culture, of sorts. We, as evaluators, have our own values, our own ways of thinking, our own language, our own hierarchy, and our own reward system. When we engage other people in the evaluation process, we are providing them with a cross-cultural experience. They often experience evaluators as imperialistic, that is, as imposing the evaluation culture on top of their own values and culture—or they may find the cross cultural experience stimulating and friendly. In either case, and all the spaces in between, it is a cross-cultural interaction […] This culture of evaluation, which we as evaluators take for granted in our own way of thinking, is quite alien to many of the people with whom we work at program levels. Examples of the values of evaluation include: clarity, specificity and focusing; being systematic and making assumptions explicit; operationalizing program concepts, ideas and goals; distinguishing inputs and processes from outcomes; valuing empirical evidence; and separating statements of fact from interpretations and judgements. These values constitute ways of thinking that are not natural to people and that are quite alien to many” (Patton, 1998, pp. 225-6, emphasis mine)
      • values of evaluation include “enquiry”, “a structured way of thinking about reality and generating knowledge” (Forss et al, 2002, p. 33)
      • “to engage in evaluation is thus also a way of learning how to learn” (Forss et al, 2002, p. 33)
    • developing networks – evaluation activities can bring together people who don’t usually work together
    • creating shared understanding
      • working together “help[s] people understand each other’s motives, and to some extend also to respect the differences” (Forss et al, 2002, p. 35)
      • note that “the usefulness of evaluation hinges directly upon the quality of the communication in evaluation exercises”  (Forss et al, 2002, p. 35)
    • strengthening the project
      • when the evaluator works to understand the program, it helps stakeholders themselves to get a “clearer understanding of the project and possibly with a new resolve to achieve the project’s aims” (Forss et al, 2002, p. 36)
      • “Patton (1998) calls this ‘evaluation as an intervention’; the evaluation becomes an intentional intervention supporting programme outcomes.” (Forss et al, 2002, p. 36)
      • “The way the team formulates questions, discusses activities and listens to experiences, may influence activities at the project level.” (Forss et al, 2002, p. 36)
    • boosting morale
      • “reminds them of the purposes they work for, and allows them to explore the relationship between their own organization and the […] impact that is expected” (Forss et al, 2002, p. 37)
      • “the fact that attention is shown, the project is investigated, viewpoints are listened to and data are collected could presumably given rise to similar positive effects as […] Hawthorne”  (Forss et al, 2002, p. 38) [though I would note that in some organizations, evaluations are only conducted when a program is seen to be failing/in trouble and the evaluator is sent it to figure out why or to decide if the program should be closed – this could de-motivate people. Also, my experience has been that if data is collected by people from whom the data was collected don’t see what is done with it, they don’t feel listened to and feel like they’ve been asked to do work (of data collection) for no reason – and that’s demotivating. So it’s really about the organization’s approach to evaluation and how they communicate]


  • because process use means that the evaluation is having an effect on the stakeholders, “an evaluation may become part of the treatment, rather than just being an independent assessment of effects” (Forss et al, 2002, p. 30)
  • “an evaluation is not neutral, it will reinforce and strengthen some aspects of the organization, presumably at an opportunity cost of time and money” (Forss et al, 2002, pp. 38-9)
  • “the report itself will normally provide little new insight. Most discoveries and new knowledge have been consumed and used during the evaluation process. The report merely marks the end of the evaluation process.”(Forss et al, 2002, p. 40)
  • The “merit” of evaluation “lies […] in discovering unknown meanings, which help stakeholders to develop a new self-awareness, and in implementing new connections between people, actions, and thoughts” (Bezzi, 2006, cited in Fletcher & Dyson, 2013)
  • Fletcher & Dyson (2013) describing an evaluation that one of them had done: “The first evaluation challenge facing the first author was in helping the project’s diverse range of partners to develop a shared understanding of what the project would be. As is so often the case in project development, there had been a primary focus on securing funding and not on the real-life details of the project itself. The project logic, its conceptualization of culture change processes and, most importantly, the why and how of this logic and concept, had not been articulated – despite the fact that articulation of such project logic and culture change conceptual framework would, in turn, affect the overall defined aim and anticipated outcomes. As argued by Weiss (1995), when interventions do not make such things clear (either to themselves, or to others), the evaluation task becomes considerably more challenging. Given the already discussed nature of the collaborative research approach, it was fitting for the evaluator to assist in such articulation in order to ensure that the evaluation plan was both coherent with and relevant to such logic and conceptualization.” (p. 425)


  • Fletcher, G., Dyson S. (2013). Evaluation as a work in progress: stories of shared learning and development. Evaluation. 19(4): 419-30.
  • Forss, K, Rebien, C. C., Carlsson, J. (2002). Process use of evaluations: Types of use that precede lessons learned and feedback. Evaluation. 8(1):29-45.
  • Patton, M.Q. (1998). Discovering process use. Evaluation. 4(2):225-233.
  • Patton, M.Q. (2008). Utilization-focused evaluation, 4th edition. Thousand Oaks, CA: Sage.
  • Peck, L. R., Gorzalski, L. M. (2009). An evaluation use framework and empirical assessment. Journal of Multidisciplinary Evaluation. 6(12): 139-156.
  •  Straus, S. E., Tetroe, J., Graham, I. D., Zwarenstein, M., Bhattacharyya, O., Leung, E. (2010). Section 3.6.1: Monitoring Knowledge Use and Evaluating Outcomes of Knowledge Use in Knowledge translation and commercialization. Retrieved from http://www.cihr-irsc.gc.ca/e/41945.html
Posted in evaluation | Tagged , , | Leave a comment

Australasian Evaluation Society (AES) Conference Recap

In September, I had the fantastic opportunity to attend the Australasian Evaluation Society conference in Perth, Western Australia. As I did with the Canadian Evaluation Society conference, I’m going to summarize some of my insights, in addition to cataloguing all the sessions that I went to. So rather than present my notes by session that I went to, I’m going to present them by topic area, and then present the new tools I learned about. Where possible. I’ve included the names of people who said the brilliant things that I took note of, because I think it is important to give credit where credit is due, but I apologize in advance if my paraphrasing of what people said is not as elegant as when the people actually said them. I’ve also made notes of my own thoughts, as I was going through my notes to make this summary, which I’ve included in [square brackets]


  • Traditionally, evaluation has been defined as being about judging merit or worth; a more contemporary view of evaluation includes it being about the production of knowledge, based on systematic enquiry, to assist decision making. (Owen) [This was interesting to me, as we have been working on elucidating the differences/overlaps among evaluation, research, monitoring, quality improvement, etc. Owen’s take on evaluation further blurs the line between evaluation and research, as research is often defined as producing new knowledge for knowledge’s sake.]
  • Evaluation is “the handmaiden of programs” (Owen) – what really matters is the effective delivery of programs/policies/strategies. Evaluation being involved on the front-end has the most potential to help that happen.
  • I really like this slide from John Gargani, the American Evaluation Association president:

Evaluation. John Gargani.

Theory vs. Practice

  • Practice is about actually doing something vs. theory, which is about having “coherent general propositions used as principles of explanation for a class of phenomena or a particular concept of something to be done or of the method of doing it; a system of rules or principles” (Owen).
  • Praxis: “the act of engaging, applying, and reflecting upon ideas, between the theoretical and the practical; the synthesis of theory and practice without presuming the primacy of either” (Owen).

Evaluative Thinking (ET)

  • ET is a form of higher order thinking: making judgments based on evidence, asking good questions, suspending judgment in the absence of sufficient evidence, etc.
  • “If I know why I believe X, I’m relatively free to change my belief, but if all I know is “X is true”, then I can’t easily change my mind even in the face of discomfirming evidence” (Duncan Rintoul).

Evaluation-Receptive Culture

  • Newcomer, citing Mayne (2010), talked about the features of an “evaluation-receptive culture”:
    • fight the compliance mentality [looking only to see if people are complying with a state program/procedure pre-supposes that the it is the “right” program/procedure – evaluation does not make such presuppositions]
    • reward learning from monitoring and evaluation
    • cultivate the capacity to support both the demand for, and supply of, information
    • match evaluation approaches/questions with methods

Evaluation and Program Planning

  • evaluative thinking and evaluation findings can be used to inform program planning (note that this isn’t always what happens. Often program planning is not as rational of a process as we’d hope!)
  • “proactive evaluation” (according to Owens et al) = we need to know:
    • what works: what interventions –> desired outcomes
    • practice: how to implement a program
    • who to involve
    • about the setting: how contextual factors affect implementation
  • innovation factors affecting program design:
    • implementation is the key process variable, not adoption [where they used “adoption” to mean “choosing the program”. My experience is that this is not how the word “adoption” is always used 1E.g., while Owen used “adoption” to refer to the “adoption” (or choosing) of a program to implement, I’ve seen others use “adoption” to refer to individuals (e.g., to what extend individuals “adopt” (or enact) the part of their program they are intended to enact).
    • the more complex the intervention, the more attention needs to be given to implementation
    • we need to analyze key innovation elements, with some elements needing more attention than others
    • the most difficult element to implement are changed user roles/role relationships
  • change is a process, not a single event
  • when implementing a program at multiple sites, there will be variation in how it is implemented
  • there must be effective support mechanisms and leadership buy-in is essential
  • evaluation tends to be more context sensitive than research [I’d qualify this with “depending on the type of research”]
  • why do people not include context sensitivity in complex intervention design?
    • command and control culture (with a lack of trust in the front lines)
    • structural limitations of processing and responding to large amounts of data with nuanced implications
    • epistemologies, especially in the health sector (where people tend to think that you can find an intervention (e.g., drug, surgery) that works and then push out that intervention, despite the evidence that you can’t just push out an intervention and expect it will be used)
  • profound differences between designers and intended users – evaluators can “translate” users voices to designers

Evidence-Based Policy

  • the theory of change of evidence-based policy :


  • “evidence-based” policy can refer to any of these levels:


  • some challenges for evidence-based policy:
    • what constitutes “evidence”?
    • is evidence transferrable? (e.g., if it “works” in a given place and time, does that necessarily mean it will work in another place or at another time?)
  • people often overstate the certainty of the evidence they collect – e.g., even if a study conclusion is that a program played a causal role in the place/time they was conducted, will it play a wide enough casual role that we can predict it will play a causal role in another time/place (which is what “evidence-based” policy is doing when it takes conclusions from a study/studies as evidence that the program should be applied elsewhere)?


  • problem: to make evaluative conclusions, you need standards to make those conclusions
  • most evaluation reports do not provide specifics about how the evaluation findings are synthesized or the standards by which the conclusions are drawn (often they do this implicitly, but it’s not made explicit
  • this lack of transparency about how evaluation conclusions are drawn makes people think that evaluation is merely subjective
  • rubric comes from “red earth” (used to mark sheep to track ownership and breeding)
  • the nature of evaluation (Scriven):


  • the logic of evaluation, summarized in 4 steps:
    1. establish criteria
    2. construct standards
    3. measure performance
    4. compare performance to standards and draw conclusions


  • you compare your performance data to the descriptors to determine whether the standard was achieved, which allows you to draw an evaluative conclusion [I am familiar with rubrics from my work as an instructor, where I often provide grading rubrics to my students so that they know what level of work I am expecting in order to get an A, B, C, D, or F on an assignment. I haven’t yet used a rubric in a program evaluation]
  • by determining the standards before you collect performance data, you are thinking about what does a “good” or “successful” program look like up front; if you only start to think about what is good enough to be considered success after you see the data, you can be swayed by the data (e.g., “Well, this looks good enough”)
  • use the literature and stakeholders to build your rubrics
  • Martens conducted a literature review and interviews and found that few evaluators write about using rubrics in their work (not clear if it’s because people aren’t using them or just aren’t writing about them) and that most people who use them learned through contact with Jane Davidson or her work
  • it was noted that because of the transparency of rubrics, people don’t argue about whether measures are “good enough” (like they did before that person used rubrics)
  • rubrics do need to be flexible to changing evaluand – it was also noted that sometimes evidence emerges during an evaluation that you hadn’t planned for in your rubric – and it’s OK to add, for example, a new criteria; but you can’t change the rubric after the fact to hide something on which the program did poorly
  • future research is needed on best practices for using rubrics and to investigate the perspectives of funders and evaluation users on rubrics

Implementation Science

  • implementation = “a specific set of activities designed to put into place an activity or program of known dimensions” (Fixsen eta l, 2005; cited by Wade)
  • this table provides a nice distinction between what we typically evaluate in program evaluation (i.e., an intervention) vs. implementation (i.e., how it gets implemented) and what happens if each of those are effective or not effective
intervention – “what” gets implementation
effective not effective
implementation – “how” it gets implemented effective actual benefits poor outcomes
not effective
  • inconsistent
  • not sustained
  • poor outcomes
  • poor outcomes
  • sometimes harmful


  • the more complex an intervention is, the more attention needs to be paid to implementation
  • the most difficult part of implementation is the changes in roles and relationships (i.e., behavioural changes)
    • change is a process, not an event
    • people don’t like to be told to change – you need to situate new behaviours in relevant history and saliency
    • understand different actors’ motivations for behaviour change
  • when you have multi-site projects/programs, you will have variation in implementation (i.e. how an intervention actually get implemented at different sites  (even though you are implementing the same intervention at the different sites)
  • why don’t people include context-sensitivity in complex intervention design?
    • command and control culture (a lack of trust in the front lines)
    • structural limitations on processing and responding to large amounts of data with nuanced implications
    • epistemologies, especially in the health sector (in the health sector, people often think that they just find a pill/needle/surgery that is proven to work in an RCT and you just need to make that available and people will use it, despite evidence that just pushing out interventions does not actually get people to use them)
  • there are profound differences between program designers and users/implementers of the program – evaluators can be a “translator” between the two.
  • evaluators can ask the challenging questions
  • our worldview is often different from program implementers and we can bring our insights
  • continuous quality improvement (CQI): “the continuous use of data to inform decision making about intervention adaptations and about additional implementation supports (e.g., training, coaching, change to administrative processes, policies, or systems) needed to improve both implementation of the intervention and the outcomes associated with the intervention” (Wade)
    • ignores the idea of “if it ain’t broke, don’t fix it”
    • uses ongoing experimentation to improves processes (not about improving/”fixing” people) – e.g., Plan-Do-Study-Act (PDSA) cycles
    • small incremental changes
    • most effective when it’s a routine part of the way people work
  • CQI evaluation questions:
    • intention to reach:
      • did the program reach its intended population?
      • how well is it reaching those who would most benefit from it? (e.g., high risk groups, location/age/gender/SES)
    • implementation:
      • to what extent is the program delivered as intended? [this assumes that the way the program is designed is actually appropriate for the setting; sometimes programs are shown to be effective in one context but aren’t actually effective in a different context. Similarly, how to implement may work well in one context but not in another context]
    • impact on outcomes:
      • what changes in status, attitudes, skills, behaviours, etc. resulted from the intervention?

Evaluation in Aboriginal Communities

  • There are many similarities between Australia and Canada with respect to Aboriginal people: a history of colonization, systematic discrimination, and ongoing oppression; A history of an imposition of “solutions” on community via programs, service delivery models, and evaluation methods; these are imposed by privileged white voices and they often harm Aboriginal communities and people rather than helping.
  • Aboriginal communities prefer:
    • self- determination
    • two-way learning
    • participating
    • capacity-building – evaluations should not be about someone coming in and taking from the community
    • include an Aboriginal worldview
    • develop a shared understanding of what partnership really means
  • Evaluation should be ongoing
  • Evaluators should be facilitators, should be respectful, should understand community capacity within the context of local values/norms
  • Trauma-informed, as communities have experienced colonial violence
  • Often evaluations do not allow the time needed to do the work that is needed to conduct evaluation in a respectful way that gets useful information
  • Communities need to have confidence in evaluations = confidence that evaluators will hear the whole story and report it ethically, and evaluations will be useful to the community and be done respectfully with the community

Systems Thinking

  • “You don’t have to be a systems theorist to be a systems thinker” (Noga). You can use systems thinking as a paradigm/worldview, without having to go into the whole theory. [This reminded me of Jonathan Morrell’s talk at the CES conference]
  • System = elements + links between the elements + boundary.
    • Without the links between the elements, it’s just a collection.
    • Boundaries can be physical, political, financial, etc. They may also be contested (not everyone may agree as to what the boundaries of a given system are). Determining the boundaries = deciding what’s in, what’s out, and what’s considered important; it’s an ethical, moral, and political decision.
  • A program always has a mental model (someone believes there is problem and the program is a way to solve it), even if they haven’t articulated it.
    • Evaluators investigate/describe:
      • the program
      • assumptions
      • mental models
      • stakeholders and their stakes (see Ladder Diagram in the Tips & Tools section below)
    • As an evaluator, look for leverage points the program is using. Are they working? Could their be better ones?
  • Interrelationships are what make a system a system instead of just a collection; they create:
    • outcomes
    • but also barriers, detours
      • function & dysfunction
      • emergent patterns & trends
  • Complex systems are unpredictable (the program can have hoped-for or intended outcomes, but can’t necessarily predict things with any certainty).


  • The Systems Iceberg: Mental Models (what is the problem and how do we think we can solve it?), whether explicit or implicit, cause us to design structures, that in turn influence patterns of systems behaviour, which lead to events, which are what we tend to observe.
    • e.g., you get a cold (event), because you haven’t been eating or sleeping well (behaviour), due to poor access to nutritious food and work stress (structures), work stress affects you because your career is important to your identity (so a threat to your career threatens your identity) and you believe resting = laziness.
    • When you are evaluating a system, start at the top: what events happened? what patterns of behaviour led to those events? what structures lead to those patterns? what mental models/assumptions lead to those structure being developed in the first place.
    • If you are designing a program, start at the bottom and work up! (Make your mental models explicit so you can make your design more intentional).
    • Can use the iceberg conceptually as you investigate the program – e.g., build it into interview questions (ask about what happened, then ask questions to find patterns, questions to uncover mental models)
      • interviews are a good way to get to mental models
      • artifacts and observations are good ways to get to system structures
      • observations and interviews are good way to get to patterns of behaviour and events.
  • Complex Adaptive Systems: “self-organization of complex entities, across scales of space, time, and organizational complexity. Process is dynamical and continuously unfolding. System is poised for change and adaptation” (Noga slide deck, 2016)


  • Think of the above diagram as more of a spiral than a circle.
    • e.g., 1. more women are single mothers and have to work, and more women choose to enter workface –> 2. policies re: childcare, tax credits for daycare/employers create daycares in response to more women in the workforce –> 3. supported childcare –> 1. new agent actions (e.g., even more women join the workforce as the new policies make it easier to do s0) and so on
  • With CASs, expect surprises – so you need to plan for them (e.g., do you have a process in place to renegotiate what data is being collected in response to changes?)
  • Wicked Problem truly resist a solution and every intervention into the wicked problem has an effect (so you can’t just pilot an intervention like you would for a normal problem, as the pilot itself will change the situation so doing that intervention again may not have the same effect as the starting point would be different; plus, the effect of the next intervention will be affected by the effect of the prior intervention; examples include: poverty, obesity epidemic, climate change, education (e.g. what do children need to know in the 21st century and how do we teach it to them?) – wicked problems interact with each other and that makes things even more complex (e.g., effects of climate change on those in poverty).
  • Take home messages from Jan on Systems Thinking:
    • be intentional – use systems thinking when it makes sense, use tools when they make sense
    • document boundaries and perspectives
    • our job as evaluators is to surface the story that the system is telling


  • some common approaches to complexity that don’t work
    • careful planning doesn’t work in situations of uncertainty (because how can you plan for the unknown?)
    • reliance on monitoring & evaluation (M&E) plans with high level annual reviews to guide implementation and oversight by higher ups who don’t have understanding of, or influence at, the front lines)
    • emphasis on celebrating successes rather than learning from failures
    • use of short timeframes and rigid budgets to reduce risk (it actually increases risk of ineffective interventions)
  • instead we need:
    • more regular reviews/active monitoring (which requires lots of resources; and we need to make sure it doesn’t become excessively structured)
    • determine where the bottlenecks for adoption are and delegate responsibility to that level, giving lots of local autonomy, coaching, foster self-organization (need decision making at the lower levels)
    • learn from good pilots and then use that to inform expansion (but also need to study how that expansion goes – can’t assume it will go the same way as the pilot)
    • payment by results gives responsibility to the implementing agencies/communities to what they want to do, but:
      • the outcomes need to be the correct ones
      • the outcomes need to be verifiable [because this can easily be gamed, where people work to change the measured outcomes, not necessarily the underlying thing you are trying to change]
    • modeling likely scenarios during design and at critical junctures using:
      • human-centred design
      • agent-based modeling
      • complex system modeling
    • all approaches need insight from evaluation
  • often when higher ups look at indicators, things seem simple (indicators alone do not reveal the complexity that occurs on the ground)


  • In the session by Plant, Cooper, & Warth, they discussed innovation in healthcare in BC and New South Wales. In the BC context, “innovation” is usually focused on something that “creates value”, whereas in NSW it’s more about “something new” (even if it’s just new to you)
  • a lively group discussion brought up some interesting points:
    • innovation happens on the ground, so a top down approach to “mandate” innovation doesn’t really work
    • innovation is a process, so the evaluation of innovation should be an evaluation of the process (rather than the product of the innovation) [though wouldn’t this depend on the evaluation question? e.g., if the evaluation question is “was the outcome of this program to support innovation worth the investment?”]
    • innovation is challenging in a risk-averse setting like healthcare, as innovation requires risk taking as you don’t know if it’s going to work
    • evaluation can have a role in supporting innovation when:
      • proximity – there is a clear line of sight between activities and outcomes
      • purpose – when a learning purpose is valued by the client
      •  evaluation is embedded in the planning cycle (using evaluative thinking/an evaluative mindset to inform planning)
    • evaluator skills needed for evaluation to drive/support innovation:
      • political nous (a.k.a. political savvy) – situational/reflexive practice competencies
      • context knowledge – i.e., knows the health system
      • content knowledge – i.e., specific to the area of innovation
    • factors that limit evaluation’s role:
      • political/leadership barriers & decision cycles
      • innovation overload
      • time frames
      • a “KPI mindset” – i.e., inappropriate outcome measurement; the use of evaluation resources for measurement rather than doing deep dives and garnering nuanced understanding
        • how do we counter the “KPI mindset”? The evaluation approach is different – e.g., you start with a question and then ask what data will provide the evidence required to answer that question (rather than starting with indicators – and assuming you know the right indicators to monitor)? (And that data might be qualitative!)

Cognitive Bias

  • cognitive bias = “habits of thought that often lead to erroneous findings and incorrect conclusions” (McKenzie)
    • e.g., framing effect: how you frame data affects how people react to it. E.g., if people are told a procedure has a 90% survival rate they are more likely to agree to it than if you say it has a 10% mortality rate. Thus, even though these mean the same thing, the way it’s framed affects the decision people make based on the evidence.
    • e.g., anchoring effect: naming a number can affect what people expect, E.g., if you ask one group “Did Ghandi die before or after the age of 5?” and a second group “”Did Ghandi die before or after the age of 140?”, and then you ask people to guess what age he actually died, the second group will guess higher than the first group. This happens even though by 5 and 140 are obviously wrong – but they “anchor” a person’s thoughts to be closer to that first number they heard.
    • there are tonnes more cognitive biases [Wikipedia has a giant list!]
  • even when we are doing a conscious reasoning process, we are still drawing on our subconscious, including biases
  • we like to believe that we make good and rational decisions, so it can be hard to accept that our thoughts are biased like this (and hard to see past our biases, even when we are aware of them)
  • there is not much research on cognitive bias in evaluators, but research does show that evaluators :
    • vary in the decisions they make
    • vary in the processes they use to make decisions
    • tend to choose familiar methods
    • are influenced by their attitudes and beliefs
    • change their decision making with experience (becoming more flexible)
    • write reports without showing their evaluative reasoning
  • some strategies to address bias:
    • conduct a “pre-mortem” –  during planning, think of all the ways that the evaluation could go wrong (helps to reduce planning bias)
    • take the outside view (try to be less narrowly focused from only your own perspective)
    • consult widely (look for disconfirming evidence, because we all have confirmation bias – i.e., paying more attention to those things that support what we already believe than those things that go against it)
    • mentoring (it’s hard to see our own biases, even for people who are experts in bias!, but we can more easily see other people’s biases)
    • make decisions explicit (i.e., explain how you decided something – e.g., how did you decide what’s in scope or out of scope? how did you decide what’s considered good vs. bad performance? This can help surface bias)
  • reflecting on our own practice (e.g., deep introspection, cultural awareness, political consciousness, thinking about how we think, inquiring into our  own thought patterns) needs to happen at points of decision and needs to be a regular part of our practice
  • 10 minutes of mindfulness per day can decrease bias (compare that with the hours per day of mindfulness for many weeks that are required to get the brain changes needed for health benefits)
  • some audience suggestions/comments:
    • have other evaluators look at our work to look for bias (it’s easier to see other people’s bias than our own)
    • we are talking about bias as if there is “one truth”, but there are multiple realities, so sometimes we misuse the word bias

Design Thinking


  • model from the Stanford Design School:
empathize→ define→ ideate→ prototype→ test→ learn
understand the experience of the user
define the problem from the user’s perspective
explore lots of ideas (divergent thinking) and then narrow them
reduce options to best options → experience them
  • test best ideas
  • observe & feedback to refine
can scale your learnings (e.g., to other projects, other users, other geographies)
 [the speaker added this one to the model]
  • model is shown as linear, but it is really iterative


  • In a session on evaluation standards, there was some good discussion on balancing the benefits of professionalizing evaluation (e.g., helps provide some level of confidence in evaluation if we have standards to adhere to and helps prevent someone who really doesn’t know what they are doing from going around claiming to do evaluation and making a bad name for the field when they do poor work) with the disadvantages (e.g., it can be elitist by keeping out people who have valuable things to contribute to evaluation but don’t have the “right” education or experience; can stifle innovation; can lead to evaluators working to meet the needs of peer reviewers rather than the needs of the client). There was also discussion about how commissioners of evaluation can lead to some issues with the quality of an evaluation by their determinations of scope, schedule, and/or budget).
  • John Owen gave an interesting “history of evaluation” in his keynote talk on Day 2. An abridged version:
    • pre-1950 – evaluation as we know it didn’t exist
    • pre-1960: Tyler, Lewin, Lazarfield in USA (If you had objectives for your program and measured them, then you could say if a program “worked” or not)
    • 1960s: with the “Great Society” in the US, there was more government intervention to meet the needs of the people and the government wanted to know if their interventions worked/was their money being spent wisely (accountability purpose of evaluation).
    • 1967 – academics had become interested in evaluation. Theory of Evaluation as being a judgement of merit/worth. Michael Scriven (an Australian) contributed the notion of “valuing”, which isn’t necessarily part of other social sciences.
    • 1980s onward – an expansion of the evaluation landscape (e.g., to influence programs being developed/underway; to inform decision making)
    • currently – a big focus on the professionalization
  • Katheryn Newcomer also presented a brief summary of evaluation history:
    • 1960s: “effectiveness”
    • 1980s: outcomes
    • 1990s: results-based
    • 2000s: evidence-based
  • Words:
    • Newcomer notes that Scriven advocates the use of the term “program impactees” rather than “beneficiaries” because we don’t know if the program recipients will actually receive benefits [though to me “impactees” suggests anyone who might be affected by the program, not just the program users (which is usually what people are talking about when they say “beneficiaries”). But I can totally appreciate the point that saying “program beneficiaries” is biased in that it pre-supposes the program users get a benefit. I usually just say “program users”/]
    • “Pracademic”- a practical (or practitioner) academic (Newcomer)
  • In discussing evaluating a complex initiative, Gilbert noted that they choose to focus their evaluation only on some areas and no one has criticized them for prioritizing some parts over other parts [I found this an interesting comment as I’m concerned on my project that if some parts aren’t evaluated, it would be easy to criticize that the project on the whole was not evaluated]. She also noted that they had really rich findings and that there was a triangulation where findings on one focus area complimented findings on another focus area

Tips and Tools

Throughout the conference, a number of speakers shared different tips and tools that might be useful in evaluation.

Ladder diagram for mapping stakeholders and stakes:

  1. list all the stakeholders
  2. ask them each “what is the purpose of this program?” (those are the “stakes”)
  3. draw lines between the stakeholders and the stakes
  • allows you to see:
    • stakes that are held by multiple groups
    • stakes that only have one stakeholder (sometimes these outliers are really important! e.g., for an after-school program in which Noga did this where they were experiencing poor attendance/high drop out rates, the kids were the only stakeholders that noted “fun” as a purpose of the program. That was the missing ingredient to why kids weren’t showing up – the program planners and deliverers were focused on things like safety and nutrition, but hadn’t thought about making it fun!)

Program Models (e.g., logic models)

  • A model is a representation of the program:
    • at a certain time
    • from a certain perspective
  • Can look at the model over time, reviewing what has changed or not changed (and what insights does that give us about the program?)

Causal Loop Diagrams

  • A diagram that shows connections and interrelationships among elements
  • Difficult to make and to use (would probably want a systems dynamics expert to assist with this if you were to make/use one)
  • Here’s an example of one (from Wikipedia):
    Causal Loop Diagram of a Model

“Low Tech Social Networking”

  • an ice breaker activity that you can use to see the mental models people are working with and to start to see connections in the network
  • ask participants to do the following on a sheet of paper:


Exploring Boundaries Activities

  • a bunch of toy animals were provided and every participant was told to pick any 4 they wantLittle toys used for an activity in the pre-conference workshop I went to
  • in table groups, participants were asked to find how many different ways they can be groups
    • e.g., some of the groups we came up with were grouping by biological taxonomy (e.g., amphibians, reptiles, mammals, birds), by land-based/water-based animals, by number of legs, by colour, in order of size
  • this allows you to illustrate how boundaries are constructed by our choices (within certain constraints) – how and why people chose the boundaries they do are interesting questions to think about

Postcard Activity

  • Participants are all asked to pick 3 postcards from a pile
  • Groups asked to make up a story using their cards. Each group tells their story to the whole crowd.
  • Debrief with the groups:
    • You are constrained by the cards each person brought in (and a perspective was brought by each person choosing the cards)
    • You find links
    • You make choices
      • Did you fit the cards to a narrative you wanted?
      • Did the story emerge from the cards?
      • There is no one right way to do it
      • A different group could come up with a totally different story from the same cards (different perspectives)
    • When you are evaluating, the program is the story. You want to understand how the story came to be. What was the process? What perspectives are reflected?
  • Bonus idea: You could use this postcard activity as a data collection tool – ask people to write the anticipated story before you start, then again midway through, then at the end. Which of the things you expected held? Which didn’t? Why did things change? What was surprising?


  • ask a question (e.g., what are we going to do about juvenile delinquency in this town?)
  • everyone writes as many ideas as they can think of one sticky notes (one idea per sticky note) and cover the wall with them
  • group then themes the ideas together
  • then ask the group “What are you going to do with these ideas?”

Game to Demo the Concept of Self-Organization

  • each person is assigned a “reference person” and they are told they are not allowed to be within 3 ft of that person
  • everyone is told to go mingle in the room
  • some people are assigned the same “reference person” – they will end up clumping together as they all try to avoid that person – this is an example of an emerging, self-organized pattern (a bunch of individual agents acting on their own reasons end up forming patterns)

Creating Personas

  • a tool commonly used in marketing where you craft a character to represent market segments (or, in the case of evaluation, stakeholder)
  • can use this to help with your stakeholder mapping and evaluation planning
  • e.g., create a persona of Max the Manager, Freda the front-line staff, Clarence the Client, etc. – what are their needs/wants/constraints/etc.? how can these characters help inform your planning?
  • avoid stereotyping (base on real data/experience as much as possible) and avoid creating “elastic” personas (i.e., contorting the character to incorporate everything you’d want in someone in that role)
  • design for the majority, but don’t forget the outliers

Participant Journey Mapping

  • a visual representation of an individual’s perspectives of their interactions and relationships with a organization/service/product
  • can use this to map out the “activities” for a logic model
  • focus on the user’s experience (what the user experiences can be quite different from what the program designer/administrator thinks the experience is)
  • think about:
    • emotional side – high points/low points; happiness/frustration; pain points
    • time/phases – before/during/after the experience
    • touch points and formats – e.g., online/offline; phone/F2F; real person/robot
  • it’s about understanding the experience
  • useful way to identify areas for improvement
  • can be used during design or during implementation
  • can be a communication tool
  • can be an evaluation tool – e.g., map a user’s journey to see what was good/bad from their perspective and identify places to review more deeply/improve

Compendium Software

Things to Read:

  • John Owen’s Book: Program Evaluation: Forms and Approaches is considered a seminal evaluation text in Australia. I haven’t read it yet, so I should really check it out!
  • Peunte & Bender (2015) – Mindful Evaluation: Cultivating our ability to be reflexive and self-aware. Journal of Multidisciplinary Evaluation. Full-text available online.
  • Moneyball for Government – a book, written by a bipartisan group, that “encourages government to use data, evidence and evaluation to drive policy and funding decisions”
  • Mayne (2010). Building an evaluative culture: the key to effective evaluation and results management. The Canadian Journal of Program Evaluation. 24(2):1–30. Full-text available online.
  • David Snowden’s cynafin. http://cognitive-edge.com/ [I’ve read this before, but think I should re-read it

Sessions I Presented:

  • Snow, M.E, Snow, N.L. (2016). Interactive logic models: Using design and technology to explore the effects of dynamic situations on program logic (presentation).
  • Snow, M.E, Cheng, J., Somlai-Maharjan, M. (2016). Navigating diverse and changing landscapes: Planning an evaluation of a clinical transformation and health information system implementation (poster).

Sessions I Attended:

  • Workshop: Connecting Systems Thinking to Evaluation Practice by Jan Noga. Sept 18, 2016.
  • Opening Keynote: Victoria Hovane – “Learning to make room”: Evaluation in Aboriginal communities
  • Concurrent session: Where do international ‘evaluation quality standards’ fit in the Australasian evaluation landscape? by Emma Williams
  • Concurrent session: Evaluation is dead. Long live evaluative thinking! by Jess Dart, Lyn Alderman, Duncan Rintoul
  • Concurrent session: Continuous Quality Improvement (CQI): Moving beyond point-in-time evaluation by Catherine Wade
  • Concurrent session: A Multi-student Evaluation Internship: three perspectives by Luke Regan, Ali Radomiljac, Ben Shipp, Rick Cummings
  • Concurrent session: Relationship advice for trial teams integrating qualitative inquiry alongside randomised controlled trials of complex interventions by Clancy Read
  • Day 2 Keynote: The landscape of evaluation theory: Exploring the contributions of Australasian evaluators by John Owen
  • Concurrent session: Evolution of the evaluation approach to the Local Prevention Program by Matt Healey, Manuel Peeters
  • Concurrent session: The landscape of using rubrics as an evaluation-specific methodology in program evaluation by Krystin Martens
  • Concurrent session: Beyond bias: Using new insights to improve evaluation practice by Julia McKenzie
  • Concurrent session: Program Logics: Using them effectively to create a roadmap in complex policy and program landscapes by Karen Edwards, Karen Gardner, Gawaine Powell Davies, Caitlin Francis, Rebecca Jessop, Julia Schulz, Mark Harris
  • AES Fellows’ Forum: Ethical Dilemmas in Evaluation Practice
  • Day 2 Closing Plenary: Balance, color, unity and other perspectives: a journey into the changing landscape in evaluation, Ziad Moussa
  • Day 3 Opening Keynote: The Organisational and Political Landscape for Evidence-informed Decision Making in Government by Kathryn Newcomer
  • Concurrent session: Effective Proactive Evaluation: How Can the Evidence Base Influence the Design of Complex Interventions? by John Owen, Ann Larson,Rick Cummings
  • Concurrent session: Applying design thinking to evaluation planning by Matt Healey, Dan Healy, Robyn Bowden
  • Concurrent session: A practical approach to program evaluation planning in the complex and changing landscape of government by Jenny Crisp
  • Concurrent session: Evaluating complexity and managing complex evaluations by Kate Gilbert, Vanessa Hood, Stefan Kaufman, Jessica Kenway
  • Closing Keynote: The role of evaluative thinking in design by John Gargani

Image credits:

  • Causal Loop Diagram is from Wikipedia.
  • The other images are ones I created, adapting from slides I saw at the conference, or photos that I took.

Footnotes   [ + ]

1. E.g., while Owen used “adoption” to refer to the “adoption” (or choosing) of a program to implement, I’ve seen others use “adoption” to refer to individuals (e.g., to what extend individuals “adopt” (or enact) the part of their program they are intended to enact).
Posted in evaluation, evaluation tools, event notes, methods | Tagged , , , , , , , | Leave a comment