As I’ve mentioned, I’m currently developing an evaluation for the implementation of an electronic health record (EHR) system. Specifically, this project crosses multiple healthcare organizations, will be implemented in dozens of healthcare facilities run by those organizations, and will be used by tens of thousands of clinicians from all disciplines as they provide care for hundreds of thousands 1Millions? of patients. So, you know, easy peasy, right?
Throughout my career as a program evaluator in healthcare, I’ve seen first hand that healthcare is a complex sector. There are a multitude of interacting actors – clinicians of all sorts of different disciplines (all with their own scopes of practice, standards, regulations, cultures, and expertise), administrators who plan services/allocate resources/run facilities/etc., government mandates to be dealt with, emerging research evidence and health technologies to translate to practice, patients with every sort of health issue imaginable (not to mention their complex social, psychological, economic, and various other facets of their lives), critical privacy and security issues to be managed – and nothing less is at stake than the health and well-being of all of our patients/clients/residents. But this project is, by far, the most complex project I’ve ever seen. Not only is it working to create and deploy a new electronic health information technology (technology is complex) for all of the healthcare professions (another layer of complexity), but it’s bringing together three separate health organizations, each with their own current processes, varying technologies, and different cultures. It’s a multi-year, multi-site project, so from an evaluation perspective, there are considerations around when different sites “go live” with the new system, what systems they had in place prior to “go live”, and what else is going on in the environment during this lengthy implementation period.
So how, exactly, does one go about evaluating such an initiative? Some of the immediate ways of evaluating if this new program “works” have some obvious flaws:
- Randomized controlled trial. Some people will suggest that if you want to know if something works, you need to do an RCT. Now, this is a great way to determine if, say, a new drug works – randomize people to get either the drug or a placebo 2Or usual care, if a current treatment exists. and see if the drug works better. But a health information technology is not like a drug – it’s far more complex than that. In addition to the complexity, there are pragmatic reasons why RCTs wouldn’t work well – the hospitals/facilities in which the technology will be implemented are not similar enough to truly serve as controls for one another. We have big hospitals, little hospitals, rural/remote facilities, specialty hospitals. As well, there’s no way to blind a site as to whether or not they have a new electronic health record or not!
- Compare our implementation to a comparator site. There really aren’t any truly comparable sites. There are many, many ways to implement an electronic health record and to find another project that is comparable to ours (e.g., standardization across multiple health organizations in dozens of facilities of varying size/acuity/geography, implementing the specific applications we are (and not implementing the ones we are not), implementing “big bang” (all applications introduced at the same time in a given site) vs. incrementally (implementing one application at a time), with patient populations similar to our, and with staff populations similar to ours, in a Canadian setting, at a similar point in history) would be impossible.
- Pre-post. Using a given site’s own data as baseline – this is getting closer to something that we can do. Of course, with this type of evaluation, we will need to collect data on the context of each go live, including what systems are in place at baseline and other changes are occurring at the same time as our implementation that may affect the outcomes of interest. So it will be difficult, if not impossible, to ascribe attribution (i.e., our project definitely caused outcome X to occur).
In the interest trying to figure out how best to plan the evaluation, I have taken a look at the literature. After all, we are not the first place to implement an electronic health record, so why not learn from those who have gone before us? As I’ve been reading, I’ve be reassured by all of the authors who are stating similar things to what I’ve been thinking: this is an extremely challenging type of project to evaluate! Next, I present some of the key points from the articles I’ve read so far.
Building A House on Shifting Sands
- article by the evaluation team tasked with evaluating the UK’s National Health Service’s project to implement a national electronic health record
- “most EHR evaluations draw upon a broadly positivist ontology and pursue causality in term[s] of an intervention’s impact and by making objective judgements concerning the outcomes and hence degree of success or failure of such initiatives” (p. 106)
- designs like RCT and pre-post “if pursued in isolation runs the risk of over-simplifying the dynamic complexity of large-scale technology-led projects […] The issues of context found in large-scale projects cannot be ‘controlled’ by traditional research design alone, but have to be embraced and actively incorporated into evaluation” (p. 106)
- “we see national EHR endeavours not as programs composed of essentially discrete ICT [information & Communication Technology) ‘projects’, dissociated from policy, technology, service delivery and clinical work. Rather we see the need to incorporate these elements in evaluation as inextricable parts of EHR programs, including the constantly changing parallel polices and strategies, complex and evolving software ecologies and diverse health care working practices, all of which interact across their porous boundaries.” (p. 106)
- two principles they emphasize:
- “the malleable character of any EHR program as it is shaped by contextual forces and is reinterpreted by various interest groups and people” (p. 106)
- “the need for evaluators to draw upon alternative perspectives and understandings of technology and the possible role of information and data in changing work practices and organizational structures and hence potential to affect specific outcomes” (p. 106)
- “these two fundamental ideas are both ontological (i.e., concerned with the assumptions made as to the nature of the reality we study) and epistemological (i.e., concerned with how we obtain valid information about that reality” (p.106)
- they originally planned a stepped wedge design
- some challenges:
- only had 30 months to study (not long enough to capture longer-term outcomes)
- they where implementing different software systems at different sites (e.g., different mixes of functionalities), so difficult to compare sites
- designed the evaluation based on the belief that there was clear “before”, “during”, and “after” implementation periods, which, though different at different sites, would be “broadly comparable in other respects (e.g., approach to training, changeover strategy, resources available, objectives set)” (p. 108)
- they discovered that some of their assumptions were questionable
- found that “implementation was highly context-bound” and so “direct comparisons or summations across the various Trusts experiences and their implementation stages would risk losing much of the valuable local detail if standard measures for comparison were abstracted away form the rich and complex causal environments found in each site” (p. 5)
- there wasn’t a clear distinction between absence of the system (before) and implementation of the system (during and after) – there was lots of work that had to be done locally/interim software solutions/etc. that made the boundaries fuzzy.
- because of geographical and institutional distribution, the evaluators couldn’t always be where the action was, so some thing would have been missed
- difficult to account for the Politics and the politics involved
- “principal insights gained into evaluation of large-scale EHR programs” (p. 111):
- EHR programs are inherently political
- EHR programs exist in a dynamic environment
- “sociotechnical intervention that is given meaning through the activities of its implementation and adoption” (p. 111)
- need evaluation by “studying changing as it occurs, rather than just by measuring achieved change (desired outcomes)” (p. 6)
- pre-post design alone “may miss vital information about the change process, and assume that there is a clear definition of “after””(p. 6)
- study “what people do, and in particular, how, and to what extent, they ‘work to make it work'” (p. 6)
- evaluations “focused on changing narrate the system change and tell the story of EHR implementation and adoption through multiple voices” (quite different than just evaluating the outcome )
- case studies can “probe deeper and address EHR systems within dynamic or distinction socio-cultural environments”
- “multiple co-ordinated case studies allow an insightful cross-site dialogue that can reveal common themes and distinct experiences” (p. 6)
- “the evaluators’ role is to be part insider and in part an outsider; understanding but also questioning” (p. 6)
- evaluators needed to be adaptive to the shifting sands
- “our approach […] became more and more one that focused on the activity ‘in between’; the period during which things (and people, and teams) were changing , rather than some end state of achieved and stabilised change.” (p. 111)
- focus became “understanding and narrating the stories of a network of NHS CRS in-the-making” … “we saw that greater insights could be gained from approaches that sought to ‘tell the whole story’ not just the ending.” (p. 111)
- they found that “the direct functionalities that [conventional driving forces for EHR: error, safety and quality of care] depended on were mostly unimplemented in the sites we studies within our timeframe” (p. 112)
- “our sociotechnical lens was in particular focused on the specific question of how things were ‘made-to-work’ rather than on how well or not the EHR systems functioned.” (p. 113)
- “from this perspective, non- or partial adoption, mis-use , non-use and workarounds are not simply negative effects, pathologies or signs of failure, but are different enactments of the ‘technology-in-use’. Over a period of time, they may chart the necessary path to a successful national EHR service” (p. 113)
- “there is no single, or standard way of best implementing national and large-scale EHR systems, and so too there is no predefined and prescriptive strategy to evaluate them” (p. 113)
Evaluating eHealth Interventions: The Need for Continuous Systematic Evaluation
- this essay “argues for continuous systematic multifaceted evaluations – throughout the life cycle of eHealth interventions – on the grounds that such an evaluative approach is likely to provide timely and relevant insights that can help to assess the short-, medium-, and long-term safety, effectiveness and cost-effectiveness of eHealth interventions” (p. 1)
- many different phrases used to describe sharing of health data using technology- e.g., “ICT” = information communications technology”, or “health IT” or “EHR” or “EMR” (also health portals and telemedicine interventions
- the phrase “eHealth should encompass the full spectrum of ICTs, whilst appreciating the context of use and the value they can bring to society” (p. 2)
- Pagliaris defined eHealth as “an emerging field of medical informatics, referring to the organization and delivery of health services and information using the Internet and related technologies. In a broader sense, the term characterises not only a technical development, but also a new way of working, an attitude, and a commitment for networked, global thinking, to improve healthcare locally, regionally and worldwide by using information and communication technology.” (quoted on p. 2)
- anticipated benefits of eHealth:
- reduce costs/improve efficiency
- reduce medical errors
- however, these haven’t been “empirically demonstrated”
- we also have to consider the risk of potential harms that could be caused by eHealth (e.g., poorly designed or hard-to-use systems could cause errors; security/privacy risks; money put into eHealth systems may be diverted from other needs in the system)
- need to map out a “chain of reasoning” that from the problem/need to the solution [this is essentially what we are doing with our logics model on my project]
- “studies adopting an experimental design approach fail to take sufficient account of the contextual considerations, which play a major role in the success for failure of the intervention being studied” (p. 3)
- the model proposed in this paper focuses a lot on the development of the solution/application and how to evaluate in that time. It does mention formative and summative evaluation during the “implement and deploy” phase
- my scope is to evaluate the implementation of the system once it is built (we have processes in place to do things like build/test/iterate/end-user test within the project to create our system, but that’s no within my scope)
Evaluating eHealth: How to Make Evaluation More Methodologically Robust
- 4 tricky issues:
- which research methods are suitable to evaluate highly complex interventions with diffuse effects?
- is it necessary to make observations at both the patient and system level?
- formative or summative evaluations?
- internal or external evaluators?
- this paper suggests:
- “methodological pluralism” – both quantitative and qualitative methods
- quant – info on how the IT system is performing, helps build theories needed to understand how interventions work (not just if they work in this one instance), which allows for generalizability
- qual – can provide information on why things work/don’t work; can “contribute to parameter estimation, particularly under a Bayesian framework” (p. 2)
- “primary unit of analysis in evaluation of IT systems is likely to be at the organisational/workgroup level (e.g., wards, hospitals, practices)” (p. 2)
- IT can affect many levels in the organization, can have many effects (both good and bad) – you need to study all the levels!
- you also need to collect data along all the points along the “causal chain” so you know not just what happened, but why it happened
- this can help you to “generate theories about possible explanation and remedies” (p. 2)
- ultimately, we are interested in positively affecting patients
- service level improvements “may be necessary, but not necessarily sufficient, conditions for a positive impact at the patient level” (p. 3)
- baseline observations are needed to put things into context
- you need multiple measurements to model cost-effectiveness/cost-benefit
- effects on morbidity/mortality may not be able to be shown (not specific enough)
- important to collect error rates and clinical process data when possible
- one challenge with IT is that “intervention and measuring system are not necessarily independent”(p. 3)
- formative evaluations can be fed back into implementation – that will affect the summative results – important to keep this in mind if you are generalizing (e.g., if future implementations don’t include formative evaluation, they might not get as good of results as your project that did include formative evaluation)
- so it’s important to document when formative results are fed back into implementation and what affect they have
- they suggest that external evaluators can “add value” to work of internal evaluators (e.g., they “can provide expertise in the measurement of endpoints” (p. 3), they may have more “credibility” because they are independent of the implementation)
- “methodological pluralism” – both quantitative and qualitative methods
Evaluating eHealth: Undertaking Robust International Cross-Cultural eHealth Research
- eHealth applications are generally local or regional (with a few national projects) – we are missing out on the potential to share learnings
- challenges to collaborating on evaluations cross-culturally – lack of standardization, experiences not being shared, languages, literacy (especially in developing countries), cultural/societal differences, differences in clincial systems/workflows and how health sysmtems are organized
- suggestions to facilitate international eHealth evaluation
- promote importance of evalution of eHealth
- standards/coherence on description the intervention (so many ways that eHealth can be done and reports often don’t describe exactly what was implemented)
- agreement on common outcome measures
- improve reporting, indexing, and systematic review of eHealth literature
Why Do Evaluations of eHealth Program Fail? An Alternative Set of Guiding Principles
- this paper responds to the previous three papers, which approach evaluation from a “positivist” set of assumptions:
- “there is an external reality that can be objectively measured;
- phenomena such as “project goals”, “outcomes”, and “formative feedback” can be precisely and unambiguously defined;
- that facts and values are clearly distinguishable;
- that generalizable statements about the relationship between input and output variables are possible” (p. 1)
- other approaches based on different philosophical assumptions:
- “”interpretivist” approaches assume a socially constructed reality (i.e., people perceive issues in different ways and assignment different values and significance to facts) – hence reality is never objectively or unproblematically knowable – and that the identity and values of the researcher are inevitably implicated in the research process
- “critical” approaches assume that critical questioning can generate insights about power relationships and interests and that one purpose of evaluation is to ask such questions on behalf of less powerful and potentially vulnerable groups (such as patients)” (p. 1)
- these alternative philosophical approaches “reject the assumption that a rigorous evaluation can be exclusively scientific” (p. 1)
- in addition to the “scientific agenda of factors, variables, and causal relationships, the evaluation must also embrace the emotions, values, and conflicts associated with the program. eHealth “interventions” may lie in the technical and scientific world, but eHealth dreams, visions, policies, and programs have personal, social, political, and ideological components, and therefore typically prove fuzzy, slippery, and unstable when we seek to define and control them” (p. 1)
- problems with using exclusively scientific approach to evaluating eHealth program:
- there are multiple goals
- not everyone agrees on the goals
- thus it is difficult to measure the project’s “success” in achieving its goals
- outcomes aren’t stable – they change over time, differ across contexts
- so many intervening variables between process –> outcome that it is impossible to attribute causation to the process (e.g., in one of the author’s projects, they identified 56 interventing variables!!)
- “key characteristics of program success may not be articulated in the vocabulary of outcomes and any not yield to measurement”
- if a program adapts to what it learns as it goes and that takes it away from the original objectives, it will be called a “failure” on those original objectives (need to be able to step back and ask if that’s really a failure or if the project succeeded in other ways instead)
- reducing things to “abstracted variables” (e.g., IT response time, morbidity, mortality) “may remove essential contextual features that are key to explaining the phenomenon under study. Controlled, feature-at-a-time comparisons are vulnerable to repeated decomposition: there are features within features, contingencies within contingencies, and tasks within tasks” (p. 2)
- “When we enter the world of variables, we leave behind the ingredients that are needed to produce a story with the kind of substance and verisimilitude that can give a convincing basis for practical action” “Substance” (conveying something that feels real) and “verisimilitude” (something that rings true) are linked to the narrative process, which Karl Weick called “sensemaking”, which is essential in a multifaceted program whose goals are contested and whose baseline is continually shifting.” (p. 2)
- “Collection and analysis of qualitative and quantitative data help illuminate these complexities rather than produce a single “truth”” (p. 2)
- narrative “allows tensions and ambiguities to be included as key findings, which may be preferable to expressing the “main” findings as statistical relationships between variables and mentioning inconsistencies as a footnote or not at all” (p. 2)
- in contrast to Lilford et al’s 4 “tricky” issues (mentioned above), these authors argue that “the tricky questions are more philosophical and political than methodological and procedural” (p. 3)
- they offer an “alternative and […] provisional set of principles” which are intentionally abstract/general so they can be applied to a variety of contexts/settings
- role of the evaluator: strike a balance between critical distance and immersion/engagement (“Ask questions such as What am I investigating—and on whose behalf? How do I balance my obligations to the various institutions and individuals involved? Who owns the data I collect?”) “ The dispassionate scientist pursuing universal truths may add less value to such a situation than the engaged scholar interpreting practice in context”
- governance process: broad-based advisory, independent chair, “formally recognises that there are multiple stakeholders and that power is unevenly distributed between them”
- “provide the interpersonal and analytic space for effective dialogue” (“Conversation and debate is not simply a means to an end, it can be an end in itself. Learning happens more through the processes of evaluation than from the final product of an evaluation report”)
- emergent approach: don’t set the evaluation plan and then follow it religiously regardless of what happens – plan needs to grow/adapt in response to findings and practical issues; “build theory from emerging data”)
- “consider the dynamic macro-level context (economic, political, demographic, technological)
- “consider different meso-level contexts (e.g., organisations, professional groups, networks), how action plays out in these settings (e.g., in terms of culture, strategic decisions, expectations of staff, incentives, rewards) and how this changes over time. Include reflections on the research process (e.g., gaining access) in this dataset”
- “consider the individuals […]through whom the eHealth innovation(s) will be adopted, deployed, and used”
- “consider the eHealth technologies, the expectations and constraints inscribed in them (e.g., access controls, decision models) and how they “work” or not in particular conditions of use. Expose conflicts and ambiguities (e.g., between professional codes of practice and the behaviours expected by technologies)”
- “narrative as an analytic tool and to synthesise findings. Analyse a sample of small-scale incidents in detail to unpack the complex ways in which macro- and meso-level influences impact on technology use at the front line. When writing up the case study, the story form will allow you to engage with the messiness and unpredictability of the program; make sense of complex interlocking events; treat conflicting findings (e.g., between the accounts of top management and staff) as higher-order data; and open up space for further interpretation and deliberation.
- critical events in the evaluation itself: “Document systematically stakeholders’ efforts to re-draw the boundaries of the evaluation, influence the methods, contest the findings, amend the language, modify the conclusions, and delay or suppress publication.”
There’s a tonne more literature on this topic 3And I haven’t really even gotten to bringing together my own thoughts on all of this – so far it’s just notes from the papers themselves., but this blog posting is already crazy long, so I think I’ll end this one here. But you can expect more on this topic soon!
References:
Bates DW, Wright A. (2009). Evaluating eHealth: Undertaking Robust International
Cross-Cultural eHealth Research. PLoS Med 6(9): e1000105 (full text)
Catwell, L, Sheikh, A. (2009). Evaluating eHealth Interventions: The Need for Continuous Systematic Evaluation. PLoS Medicine. 6(8): e.10000126 (full text)
Greenhalgh, Trisha, Russell, Jill. (2010). Why do evaluations of eHeath Programs Fail? An Alternative Set of Guiding Principles. PLoS Med 7(11): e1000360. (Full-text)
Lilford RJ, Foster J, Pringle M. (2009) Evaluating eHealth: How to Make Evaluation More Methodologically Robust. PLoS Medicine. 6(11): e. 1000186 (full text)
Takian A, Petrakaki D, Cornford T, Sheikh A, Barber N. (2012). Building a house on shifting sand: methodological considerations when evaluating the implementation and adoption of national electronic health record systems. BMC Health Serv Res. 12:105.(full text)
Image Credits:
“It’s not that simple” sign – Flickr with a Creative Commons license by futureatlas.com.