The effectiveness of the quality improvement collaborative strategy in low- and middle-income countries: A systematic review and meta-analysis

Authors: Ezequiel Garcia-Elorrio aff001;  Samantha Y. Rowe aff002;  Maria E. Teijeiro aff004;  Agustín Ciapponi aff005;  Alexander K. Rowe aff002
Authors place of work: Healthcare quality and safety department, Instituto de Efectividad Clínica y Sanitaria (IECS-CONICET), Buenos Aires, Argentina aff001;  Malaria Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America aff002;  CDC Foundation, Atlanta, Georgia, United States of America aff003;  Quality Department, Fundación para la Lucha contra las Enfermedades Neurológicas de la Infancia (FLENI), Escobar, Buenos Aires Province, Argentina aff004;  Argentine Cochrane Centre, Instituto de Efectividad Clínica y Sanitaria (IECS-CONICET), Buenos Aires, Argentina aff005
Published in the journal: PLoS ONE 14(10)
Category: Research Article
doi: https://doi.org/10.1371/journal.pone.0221919



Quality improvement collaboratives (QICs) have been used to improve health care for decades. Evidence on QIC effectiveness has been reported, but systematic reviews to date have little information from low- and middle-income countries (LMICs).


To assess the effectiveness of QICs in LMICs.


We conducted a systematic review following Cochrane methods, the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach for quality of evidence grading, and the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement for reporting. We searched published and unpublished studies between 1969 and March 2019 from LMICs. We included papers that compared usual practice with QICs alone or combined with other interventions. Pairs of reviewers independently selected and assessed the risk of bias and extracted data of included studies. To estimate strategy effectiveness from a single study comparison, we used the median effect size (MES) in the comparison for outcomes in the same outcome group. The primary analysis evaluated each strategy group with a weighted median and interquartile range (IQR) of MES values. In secondary analyses, standard random-effects meta-analysis was used to estimate the weighted mean MES and 95% confidence interval (CI) of the mean MES of each strategy group. This review is registered with PROSPERO (International Prospective Register of Systematic Reviews): CRD42017078108.


Twenty-nine studies were included; most (21/29, 72.4%) were interrupted time series studies. Evidence quality was generally low to very low. Among studies involving health facility-based health care providers (HCPs), for “QIC only”, effectiveness varied widely across outcome groups and tended to have little effect for patient health outcomes (median MES less than 2 percentage points for percentage and continuous outcomes). For “QIC plus training”, effectiveness might be very high for patient health outcomes (for continuous outcomes, median MES 111.6 percentage points, range: 96.0 to 127.1) and HCP practice outcomes (median MES 52.4 to 63.4 percentage points for continuous and percentage outcomes, respectively). The only study of lay HCPs, which used “QIC plus training”, showed no effect on patient care-seeking behaviors (MES -0.9 percentage points), moderate effects on non-care-seeking patient behaviors (MES 18.7 percentage points), and very large effects on HCP practice outcomes (MES 50.4 percentage points).


The effectiveness of QICs varied considerably in LMICs. QICs combined with other invention components, such as training, tended to be more effective than QICs alone. The low evidence quality and large effect sizes for QIC plus training justify additional high-quality studies assessing this approach in LMICs.


Labor and delivery – HIV diagnosis and management – Systematic reviews – Behavioral and social aspects of health – HIV prevention – Hypertensive disorders in pregnancy


Major failures in health care have been reported elsewhere but are most evident in low- and middle-income countries (LMICs). An evaluation of the health-related Millennium Development Goals (MDGs) found that, in 2015, when they were to be achieved, major health care quality gaps still were present in LMICs, which ignited a strong demand for quality improvement [1]. The MDGs have now been replaced by the Sustainable Development Goals (SDGs), instituted by the United Nations with the aim to contribute to the achievement of universal health coverage with quality care for all [2]. Concurrently in 2017, The Lancet Global Health Commission on High-Quality Health Systems in the SDG Era was established to review current knowledge, conduct new focused research, and propose policies for measuring and improving health care quality to reach new levels of performance in LMICs. This Commission advocated for a revision of methods that could contribute to the advance of the field of quality of care worldwide [3].

Among the several quality improvement strategies available, quality improvement collaboratives (QICs) (also known as collaborative improvement and learning collaboratives) have been used to improve health care for several decades [4]. However, reporting on specific components of QICs has been imprecise [5].

Formal QICs involve the use of healthcare teams from different sites to improve performance on a specific topic by collecting data and testing ideas with improvement cycles (usually plan-do-study-act cycles, involving planning a change, trying it, observing the results, and acting upon what is learned) supported by coaching and learning sessions [6]. QICs are supported by the concept that district managers and networks of facilities can be harnessed into learning systems that accelerate improvement in health care performance with the potential to achieve results at large scale for scale. The district level of the health system is well positioned to facilitate systematic group learning among facilities of similar types and across tiers of the health system. District-led area-based learning and planning bring together providers and administrators responsible for a catchment area to solve clinical and system problems, harmonize approaches, maximize often limited resources and create better communication and referral between facilities [7].

The use of QICs has increased rapidly despite the absence of strong evidence for effectiveness, cost-effectiveness or long-term impact. Published systematic reviews on QICs, which predominantly include studies from high-income countries, show modest improvements, particularly when addressing straightforward aspects of care where there is a clear gap between recommended and actual practice. There is still limited information from LMICs, unpublished studies, or non-English studies [810].

Recently, an extensive systematic review has been published characterizing the effectiveness of a wide array of strategies to improve health care provider (HCP) performance in LMICs (the Health Care Provider Performance Review, or HCPPR) [11]. Although this review includes QICs, thus far, these strategies have been analyzed under the broader strategy category of “group problem solving,” which includes other, non-QIC, strategies. Additionally, the most recent literature search for the HCPPR was conducted in May 2016.

The objective of this work was to particularly estimate the effectiveness of QICs in LMICs using data from the HCPPR and results of studies from an updated literature search. We aimed to inform decisions about whether to use QIC, how best to implement them, and to identify knowledge gaps on QICs in LMICs and provide direction on future evaluations of this strategy.

Materials and methods

We conducted a systematic review following Cochrane Collaboration methods and the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement for reporting [12, 13]. The study protocol was registered in PROSPERO International prospective register of systematic reviews (registration number CRD42017078108).

Study eligibility criteria

Type of study designs

Studies meeting the Cochrane Effective Practice and Organisation of Care (EPOC) Review Group for inclusion in a systematic review of interventions [14]:

  • Randomized controlled trials (RCTs)

  • Controlled before- and- after trials (CBA)

  • Interrupted time series (ITS) designs with at least 3 data points before and after the intervention, with or without comparison groups

Types of participants

HCPs (and patients that they care for) from LMICs (defined as countries with a low or middle-income economy, according to the World Bank at the time of the literature search) [15]. HCPs included hospital-, clinic-, and community-based health workers, pharmacists, and medicine vendors.

Type of intervention

Studies were included if they had an intervention arm exposed to QIC with or without other strategy components (e.g., training) compared to a non-exposed control group (or historical controls, for ITS studies) that could be defined as usual practice. QIC was defined as a strategy with the following core elements: a) a team of experts (in clinical care and quality improvement) involved in bringing together the scientific evidence, practical contextual knowledge and quality improvement methods, usually within a “change package” or toolkit; b) multiple teams from multiples sites that chose to participate; c) a model or framework for improvement that included measurable aims, data collection, implementation and evaluation of small tests of change; and d) a set of structured activities that promoted a collaborative process to learn and share ideas, innovations, and experiences (e.g. face-to-face or virtual meetings; visits to other sites; visits by experts or facilitators; web-based activities to report changes, results and comparisons with other teams; and coaching and feedback by improvement experts). The comparator was non-exposed control groups that represent usual practice.

Type of outcomes

There was no restriction on outcome type. Outcomes were grouped into the following categories.

  • Facilitators (i.e., elements that facilitate HCP performance, such as supplies and HCP knowledge)

  • Health worker practices (i.e., processes of care, such as correct treatment)

  • Patient health outcomes

  • Patient behaviors related to care-seeking or use of health services

  • Other patient behaviors (i.e., those not related to care-seeking, such as adherence to treatment regimen)

Effect sizes were based on primary outcomes, with the following exclusions.

  • For outcomes expressed as a percentage, effect sizes based on <20 observations per study group and time point, for a given comparison

  • Effect sizes based on a simulation study and not actually observed data

  • Effect sizes for which baseline and follow-up measures in the intervention group were both 100%, as this indicates that HCP performance in the intervention group had no room for improvement and did not worsen over time. Similarly, for HCP practice outcomes expressed as a percentage, we excluded effect sizes based on a baseline value of 95% or greater, as there was little room for improvement.

  • Effect sizes based on outcome measures that were not taken at comparable times between study groups. For example, if the outcome for a control group was measured at –1 month, 3 months, and 9 months since the intervention began, and the outcome for an intervention group was measured at –1 month, 3 months, and 21 months since the intervention began, the effect size based on the 9-month and 21-month outcome measures would be ineligible.

  • Outcomes from ITS studies for which the time series was highly unstable and thus could not be reliably modeled, and outlier outcome measures that probably did not represent the true trend in HCP performance.

Search strategy

The literature search was conducted in two phases (see S1 File for details). In summary, we first searched results of the HCPPR, which is a comprehensive systematic review of the effectiveness of strategies to improve health worker performance in LMICs. The HCPPR study team searched 52 electronic databases for published studies and 58 document inventories for unpublished studies from 1960s–2016, screened personal libraries, asked colleagues for unpublished studies, and performed hand searches of 854 bibliographies from previous reviews. Second, we updated the HCPPR literature search with a focus on studies of QICs (search date was March 15, 2019). This update involved the search of electronic databases (S1 File, page 14), screening bibliographies of included study reports (referred to as “reports from additional sources” in Fig 1), and seeking reports from colleagues. There were no language restrictions.

Fig. 1. Flow diagram.
Flow diagram.

Data collection

In the first phase of the review, a team of researchers assessed study eligibility, and each researcher screened studies independently. Before the screening began, concordance testing was conducted against a “gold standard” list of reports until at least 80% was identified by each researcher. In the second phase of the review, a pair of investigators (MET, EGE) independently assessed study eligibility, and discrepancies were reconciled in consultation with a third team member (AC). The study eligibility process was conducted using Covidence© from the Cochrane collaboration. Also, two investigators (AKR, SYR) assessed the eligibility of study reports that we received from colleagues. Data were extracted from the included studies independently by a pair of investigators (SYR, AKR) or researchers using a standardized form, and discrepancies were resolved through discussion. Before beginning data extraction, concordance testing of all data abstractors was conducted until the percent agreement between individual abstractors and a gold standard set of abstracted data (based on consensus by investigators SYR, AKR) was at least 80%. Data from each study were entered into a Microsoft Access database (Microsoft Inc., Redmond, Washington). Data elements included: study setting (where, when, HCP types, other contextual factors), study design, health conditions addressed, strategy description, outcome description, outcome measurements, the timing of outcome measurements in relation to the implementation of the strategy, effect sizes, sample sizes, sampling details, and data elements needed to assess risk of bias (RoB). If details regarding study characteristics or the QIC intervention were not available in study reports, we contacted study authors. Except for the purpose of meta-analysis, missing data were not imputed. For meta-analysis, we used estimates of standard errors of effect sizes that were available from the HCPPR database. A small proportion of the standard error estimates for percentage outcomes from the HCPPR database were based on imputed data (usually because sample size data were missing). Effect sizes with missing standard errors were excluded from meta-analysis.

Risk of bias (quality) assessment

We categorized RoB with methods based on guidance from the Cochrane EPOC Group [16]. RoB at the study level was categorized as low, moderate, high, or very high. We assessed the following RoB domains: number of clusters per study arm, completeness of dataset, balance in baseline outcome measurements, balance in baseline characteristics, reliability of outcomes, adequacy of concealment of allocation (where relevant), intervention unlikely to affect data collection, intervention plausibly independent of other changes, and number of data points before and after the intervention.

We used the Recommendations Assessment, Development, and Evaluation (GRADE) approach to assess the quality of evidence related to each of the key outcomes [17]. For assessments of the overall quality of evidence for each outcome, randomized studies, ITS studies, and other non-randomized studies started at “high quality”, “moderate quality” and “low quality” of evidence, respectively. Although the traditional approach is to start non-randomized studies as “low quality” [18], ITS studies with multiple periods and measurements during each period with no other limitations may constitute “moderate quality” of evidence [19, 20]. We downgraded the study one or two levels depending on the extent of violation across the following criteria: study limitations (RoB); indirectness of evidence; inconsistency; imprecision of effect estimates; or publication bias. If we did not find study limitations, we upgraded the evaluation of the quality of the evidence when the pooled estimates revealed negligible concerns about confounders, a strong dose-response gradient, or a large magnitude of effect. Considering a mean baseline health worker performance level at 40% for a process-of-care outcome expressed as a percentage, an absolute increase of 40% or more, representing a relative risk >2, allowed us to upgrade the quality of evidence by one level.

Data synthesis

Effect sizes were defined as absolute percentage-point differences; positive values meant improvement.

In non-ITS studies with pre- and post-intervention outcome measures, for outcomes that were dichotomous or expressed as a percentage, the effect size was calculated with Eq 1.

In non-ITS studies with pre- and post-intervention outcome measures, for outcomes that were continuous but not obviously bounded (e.g., a mortality rate), the effect size was calculated with Eq 2.

For ITS studies, segmented linear regression modeling was performed to estimate a summary effect size that incorporated both the level and trend effects. The summary effect size was the outcome level at the mid-point of the follow-up period as predicted by the regression model minus a predicted counterfactual value that equals the outcome level based on the pre-intervention trend extended to the mid-point of the follow-up period. This summary effect size was used because it allows the results of ITS studies to be combined with those of non-ITS studies.

To estimate strategy effectiveness from a single study comparison, the effect size was defined as the median of all effect sizes (MES) in the comparison for outcomes in the same outcome category. Results were stratified by HCP type (health facility-based vs. lay or community HCP).

For the primary analysis, we reported median, interquartile range, minimum, and maximum MES. The median effect size has been used in other systematic reviews of strategies to improve HCP performance [21, 22]. Median MES for strategy groups that were based on fewer than five study comparisons were not weighted, as weighting with small samples might cause the median to be a poor measure of central tendency when outliers are present. Median MES for strategy groups with five or more study comparisons were weighted, where the weight = 1 + the natural logarithm of the number of HCPs or (if the number of HCPs in a study was not reported) the number of service provision sites (e.g., health facilities) or (if the number of service provision sites was not reported) the number of administrative areas (e.g., districts) in the study. Strategy groups tested by at least three study comparisons were considered to have enough evidence to form generalizations—although caution is increasingly warranted as the minimum of three comparisons is approached. Strategy groups tested by only one or two study comparisons were interpreted separately.

In a secondary analysis, standard random-effects meta-analysis was used to estimate the weighted mean MES and 95% confidence interval (CI) of the mean MES of each strategy group. We used I2 as a measure of consistency for each meta-analysis, considering low heterogeneity <30%, moderate heterogeneity 30–60%, and high heterogeneity >60% [23]. We conducted a meta-analysis on one median effect size per study comparison for each outcome group, and we performed a sensitivity analysis considering all effect sizes individually to test consistency of the results.

Publication bias was assessed using Funnel Bias Assessment plots to conduct visual inspection for asymmetry for strategy-outcome groups with at least 10 studies.


During the first phase of the literature search, 216,477 citations were identified (S1 File). After screening and assessing eligibility, 46 reports from 25 studies were included (left side of Fig 1). In the second phase, which updated the search through 15 March 2019, 3207 articles were identified, and seven more reports from four studies were included after removing duplicates. Altogether, 53 reports from 29 studies with 30 study comparisons were included for this systematic review (Fig 1).

Description of included studies

The included studies were published between 2008 and 2019, from 12 LMICs in four continents. Most studies (24/29, 82.7%) were from Africa, three were from the Russian Federation, and one each was from Georgia and Mexico (Table 1). Most studies were ITS studies without controls (19/29, 72.4%), two were CBAs with randomized controls, three were CBAs with non-randomized controls, two were post-only CRTs, and one was an ITS study with controls.

Tab. 1. Characteristics of included studies.
Characteristics of included studies.

Fig 2 presents the RoB of included studies individually by specific domains. Most studies (25/29, 86.2%) had a high or very high RoB. Two studies had a moderate RoB and two had a low RoB. The 30 study comparisons from 29 studies tested six different strategies that included QICs (Table 2). The most commonly tested QIC intervention had no additional strategy components (21 study comparisons). Other QIC interventions that were tested usually combined QIC with training, with or without additional components. The median study follow-up time was about one year.

Fig. 2. Risk of bias of included studies: Summary and by domain item.
Risk of bias of included studies: Summary and by domain item.
√ Yes/done; Unclear; X No/not done; NA Not Applicable. CBA (NRC): Controlled Before-After study with non-randomized controls; CBA (RC): Pre-post study with randomized controls; CITS: Controlled interrupted time series (with non-randomized controls); HCPFI: Health Care Professional-directed financial incentives; ITS: Interrupted time series; OMT: Other management techniques; POS-CRT: Post-only study-Cluster randomized trial; QIC: Quality Improvement Collaborative; R&G: Regulation and governance; S: Supervision; SI: Strengthening infrastructure; TR: Training.
Tab. 2. Number of comparisons and risk of bias by quality improvement collaborative strategy.
Number of comparisons and risk of bias by quality improvement collaborative strategy.

In our assessment of publication bias, no strategy-outcome group had the minimum of 10 studies. However, for the one strategy-outcome group with the most studies (QIC intervention, health worker practice outcomes expressed as a percentage, n = 9 studies), the funnel plot revealed no evidence of asymmetry (S2 File).

Effect of interventions

The findings are summarized in Table 3, which presents QIC intervention effectiveness in terms of median MES (left column) and mean MES (right column and S2 File) from the random effects meta-analysis. Individual effect sizes are presented in Table 1. We had five main findings. First, for the “QIC only” strategy, effectiveness varied highly across outcome groups. For patient behaviors not related to care-seeking, the effect was moderate (median MES: 17.6 percentage points) (Table 3, row 3). For patient health outcomes, there was essentially no effect (0.3 and 1.4 percentage points for percentage and continuous outcomes, respectively). The results ranged from modestly to highly effective for health worker practice outcomes (30.2 to 44.2 percentage points) and patient care-seeking outcomes (7.7 to 62.2 percentage points).

Tab. 3. Summary of findings.
Summary of findings.

Second, for the “QCI + training” strategy for health facility-based HCPs, although there were only 4 studies, effectiveness was very high: MES 52.4 to 63.4 percentage points for health worker practice outcomes, 111.6 percentage points for patient health outcomes, and 87.7 percentage points for non-care-seeking patient behaviors (Table 3, rows 6–8). An additional study on a similar strategy (QIC + training + other management techniques) also found very high effectiveness (101.1 percentage points) for its one outcome on care-seeking patient behaviors.

Third, for the “QIC + training + strengthening infrastructure (bicycles for facilitators) + supervision + other management techniques (group process between HCP and community)” strategy, the one study found essentially no effect (MES 0.1 percentage points, for patient health outcomes) (Table 3, row 10). Fourth, for the “QIC + strengthening infrastructure (report cards) + regulation and governance (community scorecards)” strategy, the effectiveness from two studies ranged from essentially no effect (-2.8 percentage points, for non-care-seeking patient behaviors) to modest effect (9.5 percentage points, for care-seeking patient behaviors) (Table 3, rows 11–12).

Finally, the one study of lay health workers found highly variable results, ranging from essentially no effect (-0.9 percentage points, for care-seeking patient behaviors) to moderately large effects (18.7 percentage points, for non-care-seeking patient behaviors) to very large effects (50.4 percentage points, for health worker practice outcomes) (Table 3, rows 13–15).

Both the random effects meta-analysis considering one median effect size per study comparison for each outcome (Table 3), and the sensitivity analysis considering all effect sizes individually (S3 File) were consistent with the primary analysis. The certainty of the evidence according to GRADE criteria was low or very low for all strategy-outcome combinations, except for the effect of QIC + training on health worker practice outcomes for lay health workers (moderate certainty). However, as the result for this last group is based on only a single study, the generalizability is extremely limited.


This systematic review and meta-analysis on QICs in LMICs showed variable effectiveness across different outcomes and strategies. The quality of the evidence was mainly low or very low [17]. We found consistent results using different statistical approaches.

In summary, among studies of health facility-based HCPs, for the “QIC only” strategy, effectiveness varied highly across outcome groups, with no effect for patient health outcomes. For the “QIC + training” strategy, effectiveness might be very high for patient health outcomes, HCP practice outcomes, and care-seeking. Adding other management techniques to this strategy might also be highly effective for patient care-seeking behaviors. The effect of “QIC + training + strengthening infrastructure + supervision + other management techniques” or “QIC + strengthening infrastructure + regulation and governance” strategies seemed small to modest.

The only study assessing lay health workers showed effects that varied from essentially no effect on care-seeking patient behaviors to a large effect on non-care-seeking patient behaviors and HCP practice outcomes.

The main limitations of our systematic review were low quality of the evidence, scarce data on long-term effects, and heterogeneous outcomes. Also, some included studies came from unpublished gray literature, and several were conducted by the same group of authors. We attempted to address any potential imbalance in the quality of these studies by applying the same risk-of-bias assessment to all included studies. Furthermore, the random effects meta-analysis in this review was limited by the low quality of studies and wide diversity of outcomes. However, we believe meta-analysis as a secondary analysis tool provided useful complemental information about the direction, magnitude, and precision of intervention effects. Strengths of our review were that it was based on an extensive literature review from multiple sources, it used a single analytic framework with comparable effect sizes (as opposed to reporting different effect sizes, such as odds ratios and risk differences, from different studies), and it focused on LMIC settings. Its results can inform decision-making for health programs and intervention implementers with regards to which QIC-based interventions are most effective for improving which aspects of health systems in LMICs. Considering the small number of studies for each main comparison and the low quality of evidence, this review also highlights substantial evidence gaps and important opportunities for improvement in the conduct of future QIC studies.

Previous systematic reviews have approached the topic of QIC effectiveness in different ways and did not include several studies captured by our work [810]; nevertheless, they found similar effects and evidence gaps. Numerous potential determinants of QIC success were evaluated in a systematic review that did not include any of the primary studies included in our review, and only a few related to empirical effectiveness [24]. For example, some aspects of teamwork and participation in specific collaborative activities seem to improve short-term success, while sustainability of teams and continued data gathering enhanced the chances of long-term success. In a study currently underway, the impact of district-led learning on clinical practice and patient outcomes, communication, HCP motivation, and team dynamics are being explored [25, 26]. It would be desirable for future studies to examine what core components of QICs are related to patient- and provider-level outcomes.

Our findings clearly show that there is still not a solid evidence base on the effect of QICs in LMICs, although our results suggest that there are situations in which QICs could be considered. QICs are not static structures–rather, they have been implemented and adapted in a number of ways to achieve their stated aims. Some common adaptations include their use for generating new ideas and for empowering HCPs. Although based on relatively few studies, our review’s results suggest that combining QICs with training might be the most effective approach for implementing QICs.

Finally, on the recommendation for additional studies on QICs, we think that the ideal study design would be an interrupted time series with a randomized control group. The justification is that such a design would allow for an overall evaluation of intervention effectiveness as well as an evaluation of heterogeneity of effectiveness among sites. The design would also allow for a characterization of the effect over time. Other attributes include a follow-up time of at least 12 months, an objective data source for the evaluation (i.e., not only data collected by the QI teams unless the data quality is reasonably good and data quality does not change over time), a sample size that reflects real-world QICs (i.e., at least 20 facilities per study arm), qualitative and process evaluation components to describe how the intervention worked, a costing and economic evaluation, and an assessment of whether the intervention had any negative effects (e.g., drawing health workers’ attention to one aspect of care that decreases quality for other aspects of care).

In conclusion, the overall quality of the evidence on the effectiveness of QICs in LMICs was low. Based on the large and variable effect sizes seen in some outcome groups, additional research with high-quality studies is warranted to provide a more reliable and precise estimation of the effect of this promising intervention.

Supporting information

S1 Checklist [pdf]
PRISMA checklist.

S1 File [pdf]
Details of the search strategy.

S2 File [pdf]
Meta-analysis results, forest plots, and funnel plots.

S3 File [pdf]
Sensitivity analysis and list of excluded studies.


