A Comparison of Frameworks Evaluating Evidence for Global Health Interventions

article has not abstract

Published in the journal: . PLoS Med 10(7): e32767. doi:10.1371/journal.pmed.1001469
Category: Policy Forum
doi: https://doi.org/10.1371/journal.pmed.1001469

Summary

article has not abstract

Summary Points

Evidence-based decision-making is critical to informing policy in global health interventions and programs.
Existing frameworks for evaluating evidence that were developed or recommended for community or public health decision-making vary in their criteria and application.
We compared how different community or public health evidence frameworks assessed the same body of evidence for three advocated global health interventions and find there can be substantial differences in the rating of evidence, which could contribute to differences in policy recommendations.
All current frameworks emphasize effectiveness, and have shortcomings on other important factors into policy decision-making such as costs, implementation issues, context, and sustainability.
As global health policymakers move towards evidence-based approaches, we find a gap between what is currently available and the needs for an evidence framework appropriate for application to a global health setting in a low- and middle-income country context. More work is needed to either adapt one or more existing frameworks, or to develop an entirely new framework to meet the needs of policymakers and others responsible for implementing global health interventions.

Introduction

A major movement in global health and development in the past 10 years has been the enthusiastic adoption by many of randomized controlled trials (RCTs) from the field of medicine to represent the most rigorous method to evaluate a program's causal impact [1]–[4]. More recently, this movement has brought about a conceptual debate in global health and development about the proper role for RCTs in informing policy, with increasing efforts to “mind the gap” [5] between the evidence generated by RCTs (which focus on internal consistency) and the larger policy questions at the level of communities or populations (which require, among other things, generalizability) [4],[6]–[10]. The field of medicine that developed the RCT also developed the concept of “evidence-based” medicine that aims to improve health policy decision making by encouraging policymakers to base their policies on the best available evidence. Large international policy-making bodies appear set on applying a similar concept to global health and health systems research [4],[11]. In order to be evidence-based, decisions about global health interventions must consider the available evidence in terms of its quantity, quality, and relevance. Rather than use implicit judgment or other ad hoc methods, in evidence-based medicine it is now advocated and common practice to use a formal framework for considering the evidence as part of a systematic review, the advantages of which include increased transparency and better decision-making. Formal frameworks for evaluating evidence about community-level public health interventions have been proposed and advocated for similar reasons [12]–[17]. These frameworks differ in the degree to which they weight the importance of data from RCTs as compared to data from other study designs, the magnitude of potential benefits and harms, the role of context and implementation, and other factors. At present, there are no commonly accepted guidelines within global public health for how to evaluate evidence, and there is scant evidence to guide policymakers when selecting a framework to use for assessing a body of evidence about a global health intervention. We sought to assess how summary conclusions about the evidence for interventions or programs currently in use or proposed for wide adoption could be influenced by the choice of framework. Consistent results across frameworks would increase policymakers' confidence in using and applying evidence frameworks, and may thereby help to narrow the gap between the questions asked by global health researchers and policymakers. Inconsistent results would call for a re-examination of current frameworks in terms of the domains they assess and the ways in which they are applied.

Identifying and Applying Evidence Frameworks to Support Policy Decision Making

We define a global health evidence framework as one which uses multiple domains to arrive at a summary judgment of the evidence for community or population health interventions or programs, which could be applied to the kinds of interventions or programs that are commonly being considered in low- and middle-income countries. This includes frameworks explicitly developed for global health interventions, frameworks that were presented with a global health intervention as an illustrative example of its application, and general community or population health frameworks that could be applied to global health interventions. Details of our search methodology are summarized in Box S1, from which we identified six frameworks [12]–[17]. Table 1 lists some key characteristics of each framework. Although our search methods were extensive, it is possible there are additional frameworks that we did not identify. However, the frameworks we did identify are a sufficient sample to explore the issue of whether potential policy recommendations derived from use of a framework could be sensitive to the choice of framework. All six frameworks indicated that their goal was “grading” (or “evaluating”) “evidence” on “interventions.”

**Tab. 1. Summary of existing public health frameworks considered.**

We next identified a diverse set of global health interventions as potential candidates with which to apply these existing frameworks by considering the major causes of morbidity and mortality in developing countries or the major diseases of focus among international global health financing bodies. We developed a draft set of key dimensions for classifying global health interventions in order to map out these potential exemplars to select a diverse set of interventions along these dimensions (e.g., population affected, whether the intervention addresses a communicable or non-communicable disease, etc.). We were advised on this project by a multidisciplinary panel of experts (listed in Acknowledgments) composed of global health experts in academia, donor agencies, policymakers, and practitioners who provided input on the dimensions and on their preferred exemplars. From this exercise, we selected three interventions as exemplars for assessing the frameworks that represented a diversity of interventions: household water chlorination, prevention of mother-to-child transmission of HIV (PMTCT), and lay or community health workers to reduce childhood morbidity and mortality. Table S1 demonstrates the diversity of these exemplars across our identified dimensions, and Box S2 presents the full list of potential exemplars from which these three were chosen.

For each of the three chosen global health exemplar interventions we located published systematic reviews of their effectiveness by conducting a Medline search. For each of these reviews, we retrieved the original research studies cited and used both the original studies and the systematic reviews as sources of evidence when applying the frameworks. As is customary and recommended in most evidence-based medicine processes, we used two members of the research team to independently apply the six frameworks to this evidence base for each of the three exemplar interventions. Disagreements were settled by a group consensus process. The results of the applications were compared both quantitatively (i.e., in how many cases was there congruence among frameworks) and qualitatively. Table S2 summarizes the evidence base for the three chosen global health exemplars, their primary outcomes of interest, and their associated systematic reviews and original research studies.

Different Evidence Frameworks May Support Different Policy Decisions

Table 2 summarizes our findings from the application of the six evidence frameworks to the three global health exemplars. We focus our attention on a comparison of the summary conclusions for each outcome/exemplar using the different frameworks. More details for how we assigned grades to a particular outcome are available in an Agency for Healthcare Research and Quality report [18].

**Tab. 2. Results on three exemplars applied to six evidence frameworks.**

For studies of household water chlorination, we consider the primary clinical outcome of (self-reported) diarrheal incidence over measured water quality due to its clinical importance. The evidence frameworks generally conclude that the evidence for diarrheal outcomes is weak or moderate. Only the U.S. Community Preventive Services Task Force (USCPSTF) framework assigns household water chlorination its highest grade (“strong”). All of the remaining frameworks assign the evidence grades that are lower than their highest possible rating, with the evidence classifications ranging from the highest categorization of “strong” by the USCPSTF framework, to the next-to-lowest grade of “C – satisfactory” within the Australian NHMRC framework.

For PMTCT studies, all of the frameworks assign their highest possible grade to the body of evidence with the exception of the framework by Tang and colleagues, which assigns a “Grade 2B, Level 1 Possible.” However, this grade is the result of our strict interpretation of the rule that only interventions with a relative risk (RR) of greater than two qualify as “strong.” If there is some flexibility with this strict cutoff, the rating would change to the highest grade of “Grade 1 level 1 strong.”

For interventions involving community or lay health workers, we chose the outcome “reduce morbidity in children under 5 years old compared to usual care” as it seemed both to be an outcome very important to communities and to have enough studies to make a meta-analysis meaningful. With this intervention the various frameworks again generally rate the evidence as being of low or moderate quality with the exception of USCPSTF, which assigns the highest grade of “strong.” HASTE, on the other hand, would rate this same body of evidence as grade three “insufficient,” and GRADE also assigns it a “low quality of evidence.”

Overall, Table 2 shows that for two of the three exemplars assessed, at least one framework resulted in an overall assessment that varied by at least two categories from one or more of the other frameworks when applied to the same evidence base (i.e., from “A” to “C,” or from “strong” to “insufficient,” etc.).

Discussion

We find that assessing the same body of evidence using existing public health frameworks yields somewhat to markedly different conclusions depending on the framework applied. Thus, in practice, if the current push towards evidence-based global health policy making includes adoption of an evidence framework (one key method for ensuring an “evidence-based” approach), the choice of framework for evaluating the evidence could potentially lead to different policy decisions, a potentially unintended consequence of the choice of framework. For example, had policymakers used the USCPSTF framework, they would have reached the conclusion that all three interventions were equally strong and supported. Conversely, had policymakers used the GRADE or HASTE framework, they would have concluded that the three interventions varied from “insufficient” or “low quality” to “strong” and “high quality.” Had six different policymakers been considering the same evidence on household water chlorination to reduce diarrheal outcomes and each used a different framework, they could have reached differing conclusions about the strength of support that ranged from grade “C” to grade “B” to “possible” to “moderate quality” to “strong.” Actual policy decisions will include other factors, such as feasibility, financial resources, and health systems capacity, but the current push for “evidence-based” decision-making makes the adoption of an evidence framework likely, and, therefore, the rating of evidence would likely be one important factor in decision-making.

Why should these frameworks differ in their conclusions? One possible reason is that they differ in whether and to what degree they deal with the following domains: (1) how strict or explicit the rules are for classifying the strength of evidence; (2) the magnitude of potential benefits versus harms; (3) what role, if any, context is taken into consideration in evaluating the evidence; (4) how much is reported about the details of implementation; (5) whether the ease of implementing the intervention or program is taken into consideration; (6) total costs for the program or intervention; and (7) sustainability of the program or intervention, both cost-wise and programmatically. The USCPSTF, Australian NHMRC, the UK National Health Service (NHS) Health Development Agency, and GRADE have stricter rules for classifying the strength of evidence than the HASTE framework and the framework from Tang and colleagues, which allow for more individual interpretation. The Tang and colleagues framework, GRADE, the USCPSTF, and Australian NHMRC all make explicit a consideration of the magnitude of the benefits, while HASTE and the NHS Health Development Agency do not. Only the Australian NHMRC framework explicitly considers context, and only the HASTE framework includes a detailed assessment of implementation data, although context could be considered part of “widely demonstrated” in the Tang and colleagues framework and could be considered in the “corroboration” criterion in the NHS Health Development Agency Framework. The USCPSTF considers barriers to implementation in their evidence review but not as part of the overall assessment of the body of evidence. Costs and sustainability are not included routinely in any of the frameworks, although GRADE does have guidance on including cost as an outcome and on incorporating cost into the strength of the evidence [13],[19], and the USCPSTF searches for cost information on recommended interventions. While it is likely that not all of these frameworks necessarily had as goals the assessment of information on costs, contexts, or implementation, it is important to note their absence because experts consider these to be crucial aspects of the assessment of evidence about global health interventions for policy decision-making. Their absence from the frameworks could be due to their original absence from the evidence base – that is, the published systematic reviews on the exemplars and the original articles included in those reviews, which also may not have had as their primary objective identifying evidence about implementation, cost, sustainability, etc. However, the absence of this kind of evidence from the reviews and the original articles included in them means that the evidence is also not generally available to policymakers who need to make decisions. This gap between the needs of health care policymakers and the research products of global health researchers is one that would likely need to be closed if global health policies are to be improved.

An additional cause for variability in the conclusions among different frameworks when assessing the same global health evidence may be variability in applying the individual frameworks themselves. When individual team members initially applied the frameworks to the evidence, they sometimes reached different conclusions, largely due to the need for individual interpretation of the criteria used in the frameworks. These differences were resolved in a consensus process, as is standard practice in most evidence-based medicine processes. Nevertheless, this situation raises the possibility of potentially poor inter-rater reliability within frameworks, which has also been observed with frameworks used to assess the risk of bias or strength of evidence for conventional medical therapies [20]–[22]. With our study design, it is not possible to estimate the relative contributions from these two potential contributing factors (the differences between frameworks in the domains to be considered and how they are scored versus poor inter-rater reliability) on our conclusions. However, we found that across raters, no initial grades differed by more than one category, whereas across systems we did find differences of two or more grades.

Although a similar exercise could have been undertaken with more than three exemplars, our initial choice of three proved sufficient to identify variability both within and across frameworks in how evidence is assessed. Moreover, additional exemplars will not change the identification of context, costs, and implementation data as important missing domains of these frameworks. We also recognize that our results may be sensitive to the composition of participants on our technical expert panel who provided input at each stage of this process, and further evaluation of these results with a wider group of stakeholders is warranted. However, these stakeholders' identification of a need for more data about implementation is consistent with the increasing recognition of the importance of implementation reporting in other health-related fields [6],[23],[24].

As global health policymakers move towards evidence-based approaches, our study reveals a gap between what is currently available and the needs for an evidence framework appropriate for application to a global health setting in a developing country context. More work is needed to either adapt one or more existing frameworks, or to develop an entirely new framework to meet stakeholders' needs. For example, Lewin and colleagues on the Task Force on Developing Health Systems Guidance of the World Health Organization recently described the beginnings of an adaptation of the GRADE framework [25]. Current frameworks for evaluating evidence on public health interventions have evolved from the clinical model where decision making is determined by rigorous systematic review of efficacy trials, usually based on data derived from RCTs that emphasize efficacy for the individual patient. Yet the evidence requirements for scaling up global health programs include three key elements: efficacy at the individual level, effectiveness at the population level, and sustainability at the host-country level. These evidence streams can often result from disparate research approaches, implying an additional set of needs when evaluating the evidence. A global health evidence evaluation framework must be systematic while being able to incorporate relevant information from studies on context or other details that are not traditionally reported in published findings from RCTs. We recommend that the global health community work to develop a framework or frameworks that can take into account evidence relevant to all three key elements needed for policy decision making, which can be applied with a reliability sufficient to give policymakers confidence that differences in ratings reflect differences in the underlying evidence. Such a framework could help to improve the flow of information between researchers and policymakers, as well as narrow the gap between them in terms of the questions they ask and the tools they utilize to answer them.

Supporting Information

Zdroje

1. KremerM, GlennersterR (2011) Improving Health in Developing Countries: Evidence from Randomized Evaluations, Chapter 4. Handbook of Health Economics 201–315.

2. BanerjeeAV, DufloE (2009) The Experimental Approach to Development Economics. Annu Rev Econom 1: 151–178.

3. Banerjee A, Duflo E (2011) Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty: Public Affairs. 320 pp.

4. YameyG, FeachemR (2011) Evidence-based policymaking in global health - the payoffs and pitfalls. Evid Based Med 16: 97–99.

5. (2011) Mind the Gap: From Evidence to Policy Impact. International Initiative for Impact Evaluation. Conference held in Cuernavaca, Mexico. Available: http://www.3ieimpact.org/events/3ie-conferences-and-workshops/mexico-impact-evaluation-conference/.

6. Cartwright N (2011) Knowing what we are talking about: Why evidence doesn't always travel. Conference presentation 31 May. Available: http://www.3ieimpact.org/media/filer/2012/05/03/knowing_what_talking_31emay2011.pdf.

7. Deaton A (January 2009) The Keynes Lecture - Randomization in the tropics, and the search for the elusive keys to economic development. Available: http://www.econ.uiuc.edu/~roger/courses/574/readings/Deaton_Instruments%20of%20development.pdf.

8. RosenzweigM (2012) Thinking Small: Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty. J Econ Lit 50: 115–127.

9. Rodrik D (2008) The New Development Economics: We Shall Experiment, But How Shalle We Learn? Available: http://www.hks.harvard.edu/fs/drodrik/Research%20papers/The%20New%20Development%20Economics.pdf.

10. RavallionM (2012) Fighting Poverty One Experiment at a Time: Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty: Review Essay. J Econ Lit 50: 103–114.

11. MurrayCJ, LopezAD (1996) Evidence-based health policy–lessons from the Global Burden of Disease Study. Science 274: 740–743.

12. TangKC, ChoiBC, BeagleholeR (2008) Grading of evidence of the effectiveness of health promotion interventions. J Epidemiol Community Health 62: 832–834.

13. GuyattGH, OxmanAD, VistGE, KunzR, Falck-YtterY, et al. (2008) GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 336: 924–926.

14. BaralS, WirtzA, SifakisF, JohnsB, WalkerD, et al. (2012) “The Highest Attainable Standard of Evidence (HASTE) for HIV/AIDS Interventions: Towards A Public Health Approach to Defining Evidence. Public Health Reports. 127: 572–584.

15. BrissPA, ZazaS, PappaioanouM, FieldingJ, Wright-De AgueroL, et al. (2000) Developing an evidence-based Guide to Community Preventive Services–methods. The Task Force on Community Preventive Services. Am J Prev Med 18: 35–43.

16. National Health and Medical Research Council (December 2009) NHMRC levels of evidence and grades for recommendations for developers of guidelines. Australia. Available: http://www.nhmrc.gov.au/guidelines/resources-guideline-developers. Accessed June 2013.

17. Wightman A, Ellis S, Cullum A, Sander L, Turley R (2005) Grading evidence and recommendations for public health interventions: developing and piloting a framework. Health Development Agency. Available: http://www.nice.org.uk/nicemedia/docs/grading_evidence.pdf. Accessed December 2012.

18. Shekelle PG, Maglione MA, Luoto J, Johnsen B, Perry TR. Global Health Evidence Evaluation Framework. Research White Paper (Prepared by the Southern California Evidence-based Practice Center under Contract No. 290-2007-10062-I). AHRQ Publication No. 13-EHC008-EF. Rockville, MD: Agency for Healthcare Research and Quality. January 2013. Available: http://www.effectivehealthcare.ahrq.gov/ehc/products/363/1384/White-Paper_Global-Health-Frameworks-1-22-13.pdf

19. BrunettiM, ShemiltI, PregnoS, ValeL, OxmanAD, et al. (February 2013) Grade guidelines: 10. Considering resource use and rating the quality of economic evidence. J Clin Epidemiol 66: 140–150.

20. AtkinsD, BestD, BrissPA, EcclesM, Falck-YtterY, et al. (2004) Grading quality of evidence and strength of recommendations. BMJ 328: 1490.

21. Berkman ND, Lohr KN, Morgan LC, Richmond E, Kuo TM, et al.. (May 2012) Reliability Testing of the AHRQ EPC Approach to Grading the Strength of Evidence in Comparative Effectiveness Reviews. Methods Research Report. (Prepared by RTI International–University of North Carolina Evidence-based Practice Center under Contract No. 290-2007-10056-I.) Rockville, MD: Agency for Healthcare Research and Quality. AHRQ Publication No. 12-EHC067-EF. Available: http://effectivehealthcare.ahrq.gov/ehc/products/339/1099/Methods_ReliabilityTesting_FinalReport_20120522.pdf.

22. HartlingL, OspinaM, LiangY, DrydenDM, HootonN, et al. (2009) Risk of bias versus quality assessment of randomised controlled trials: cross sectional study. BMJ 339: b4012.

23. MohlerR, BartoszekG, KopkeS, MeyerG (2012) Proposed criteria for reporting the development and evaluation of complex interventions in healthcare (CReDECI): guideline development. Int J Nurs Stud 49: 40–46.

24. ShekellePG, PronovostPJ, WachterRM, TaylorSL, DySM, et al. (2011) Advancing the science of patient safety. Ann Intern Med 154: 693–696.

25. LewinS, Bosch-CapblanchX, OliverS, AklEA, VistGE, et al. (2012) Guidance for Evidence-Informed Policies about Health Systems: Assessing How Much Confidence to Place in the Research Evidence. PLoS Med 9: e1001187 doi:10.1371/journal.pmed.1001187

26. ArnoldBF, ColfordJMJr (2007) Treating water with chlorine at point-of-use to improve water quality and reduce child diarrhea in developing countries: a systematic review and meta-analysis. Am J Trop Med Hyg 76: 354–364.

27. ClasenT, SchmidtWP, RabieT, RobertsI, CairncrossS (2007) Interventions to improve water quality for preventing diarrhoea: systematic review and meta-analysis. BMJ 334: 782.

28. SiegfriedN, van der MerweL, BrocklehurstP, SintTT (2011) Antiretrovirals for reducing the risk of mother-to-child transmission of HIV infection. Cochrane Database Syst Rev CD003510.

29. JohriM, Ako-ArreyD (2011) The cost-effectiveness of preventing mother-to-child transmission of HIV in low- and middle-income countries: systematic review. Cost Eff Resour Alloc 9: 3.

30. ChigwedereP, SeageGR, LeeTH, EssexM (2008) Efficacy of antiretroviral drugs in reducing mother-to-child transmission of HIV in Africa: a meta-analysis of published clinical trials. AIDS Res Hum Retroviruses 24: 827–837.

31. LewinS, Munabi-BabigumiraS, GlentonC, DanielsK, Bosch-CapblanchX, et al. (2010) Lay health workers in primary and community health care for maternal and child health and the management of infectious diseases. Cochrane Database Syst Rev CD004015.