Error rates of human reviewers during abstract screening in systematic reviews
Autoři:
Zhen Wang aff001; Tarek Nayfeh aff002; Jennifer Tetzlaff aff003; Peter O’Blenis aff003; Mohammad Hassan Murad aff001
Působiště autorů:
Evidence-based Practice Center, Mayo Clinic, Rochester, Minnesota, United States of America
aff001; Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery Mayo Clinic, Rochester, Minnesota, United States of America
aff002; Evidence Partners, Ottawa, Ontario, Canada
aff003
Vyšlo v časopise:
PLoS ONE 15(1)
Kategorie:
Research Article
doi:
https://doi.org/10.1371/journal.pone.0227742
Souhrn
Background
Automated approaches to improve the efficiency of systematic reviews are greatly needed. When testing any of these approaches, the criterion standard of comparison (gold standard) is usually human reviewers. Yet, human reviewers make errors in inclusion and exclusion of references.
Objectives
To determine citation false inclusion and false exclusion rates during abstract screening by pairs of independent reviewers. These rates can help in designing, testing and implementing automated approaches.
Methods
We identified all systematic reviews conducted between 2010 and 2017 by an evidence-based practice center in the United States. Eligible reviews had to follow standard systematic review procedures with dual independent screening of abstracts and full texts, in which citation inclusion by one reviewer prompted automatic inclusion through the next level of screening. Disagreements between reviewers during full text screening were reconciled via consensus or arbitration by a third reviewer. A false inclusion or exclusion was defined as a decision made by a single reviewer that was inconsistent with the final included list of studies.
Results
We analyzed a total of 139,467 citations that underwent 329,332 inclusion and exclusion decisions from 86 unique reviewers. The final systematic reviews included 5.48% of the potential references identified through bibliographic database search (95% confidence interval (CI): 2.38% to 8.58%). After abstract screening, the total error rate (false inclusion and false exclusion) was 10.76% (95% CI: 7.43% to 14.09%).
Conclusions
This study suggests important false inclusion and exclusion rates by human reviewers. When deciding the validity of a future automated study selection algorithm, it is important to keep in mind that the gold standard is not perfect and that achieving error rates similar to humans may be adequate and can save resources and time.
Klíčová slova:
Automation – Cardiovascular medicine – Citation analysis – Database searching – Distillation – Health screening – Mental health and psychiatry – Systematic reviews
Zdroje
1. Cochrane AL. 1931–1971: a critical review, with particular reference to the medical profession. Medicines for the year. 2000;1979:1.
2. Ioannidis JP. The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews and Meta-analyses. Milbank Q. 2016;94(3):485–514. Epub 2016/09/14. doi: 10.1111/1468-0009.12210 27620683
3. Page MJ, Altman DG, McKenzie JE, Shamseer L, Ahmadzai N, Wolfe D, et al. Flaws in the application and interpretation of statistical analyses in systematic reviews of therapeutic interventions were common: a cross-sectional analysis. J Clin Epidemiol. 2018;95:7–18. Epub 2017/12/06. doi: 10.1016/j.jclinepi.2017.11.022 29203419.
4. Baudard M, Yavchitz A, Ravaud P, Perrodeau E, Boutron I. Impact of searching clinical trial registries in systematic reviews of pharmaceutical treatments: methodological systematic review and reanalysis of meta-analyses. BMJ. 2017;356:j448. Epub 2017/02/19. doi: 10.1136/bmj.j448 28213479
5. Higgins JP, Green S. Cochrane handbook for systematic reviews of interventions: John Wiley & Sons; 2011.
6. Murad MH, Montori VM, Ioannidis JP, Jaeschke R, Devereaux P, Prasad K, et al. How to read a systematic review and meta-analysis and apply the results to patient care: users’ guides to the medical literature. Jama. 2014;312(2):171–9. doi: 10.1001/jama.2014.5559 25005654
7. Wang Z, Asi N, Elraiyah TA, Abu Dabrh AM, Undavalli C, Glasziou P, et al. Dual computer monitors to increase efficiency of conducting systematic reviews. J Clin Epidemiol. 2014;67(12):1353–7. Epub 2014/08/03. doi: 10.1016/j.jclinepi.2014.06.011 25085736.
8. Wang Z, Noor A, Elraiyah T, Murad M, editors. Dual monitors to increase efficiency of conducting systematic reviews. 21st Cochrane Colloquium; 2013.
9. Allen IE, Olkin I. Estimating time to conduct a meta-analysis from number of citations retrieved. JAMA. 1999;282(7):634–5. Epub 1999/10/12. doi: 10.1001/jama.282.7.634 10517715.
10. Khangura S, Konnyu K, Cushman R, Grimshaw J, Moher D. Evidence summaries: the evolution of a rapid review approach. Syst Rev. 2012;1:10. Epub 2012/05/17. doi: 10.1186/2046-4053-1-10 22587960
11. Hailey D, Corabian P, Harstall C, Schneider W. The use and impact of rapid health technology assessments. International journal of technology assessment in health care. 2000;16(2):651–6. doi: 10.1017/s0266462300101205 10932429
12. Patnode CD, Eder ML, Walsh ES, Viswanathan M, Lin JS. The use of rapid review methods for the US Preventive Services Task Force. American journal of preventive medicine. 2018;54(1):S19–S25.
13. Ganann R, Ciliska D, Thomas H. Expediting systematic reviews: methods and implications of rapid reviews. Implement Sci. 2010;5:56. Epub 2010/07/21. doi: 10.1186/1748-5908-5-56 20642853
14. O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic reviews. 2015;4(1):5.
15. Li D, Wang Z, Wang L, Sohn S, Shen F, Murad MH, et al. A Text-Mining Framework for Supporting Systematic Reviews. Am J Inf Manag. 2016;1(1):1–9. Epub 2017/10/27. 29071308
16. Li D, Wang Z, Shen F, Murad MH, Liu H, editors. Towards a multi-level framework for supporting systematic review—A pilot study. 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2014: IEEE.
17. Alsawas M, Alahdab F, Asi N, Li DC, Wang Z, Murad MH. Natural language processing: use in EBM and a guide for appraisal. BMJ Evidence-Based Medicine. 2016;21(4):136–8.
18. Cohen AM, Hersh WR, Peterson K, Yen P-Y. Reducing workload in systematic review preparation using automated citation classification. Journal of the American Medical Informatics Association. 2006;13(2):206–19. doi: 10.1197/jamia.M1929 16357352
19. Beller E, Clark J, Tsafnat G, Adams C, Diehl H, Lund H, et al. Making progress with the automation of systematic reviews: principles of the International Collaboration for the Automation of Systematic Reviews (ICASR). Syst Rev. 2018;7(1):77. Epub 2018/05/21. doi: 10.1186/s13643-018-0740-7 29778096
20. O’Connor AM, Tsafnat G, Gilbert SB, Thayer KA, Shemilt I, Thomas J, et al. Still moving toward automation of the systematic review process: a summary of discussions at the third meeting of the International Collaboration for Automation of Systematic Reviews (ICASR). Syst Rev. 2019;8(1):57. Epub 2019/02/23. doi: 10.1186/s13643-019-0975-y 30786933
21. Bannach-Brown A, Przybyla P, Thomas J, Rice ASC, Ananiadou S, Liao J, et al. Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error. Syst Rev. 2019;8(1):23. Epub 2019/01/17. doi: 10.1186/s13643-019-0942-7 30646959
22. SR Toolbox 2019 [cited 2019 August 6]. http://systematicreviewtools.com/index.php.
23. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–53. doi: 10.1126/science.aax2342 31649194
24. O’Connor AM, Tsafnat G, Thomas J, Glasziou P, Gilbert SB, Hutton B. A question of trust: can we build an evidence base to gain trust in systematic review automation technologies? Systematic Reviews. 2019;8(1):143. doi: 10.1186/s13643-019-1062-0 31215463
25. Smith V, Devane D, Begley CM, Clarke M. Methodology in conducting a systematic review of systematic reviews of healthcare interventions. BMC Med Res Methodol. 2011;11(1):15. Epub 2011/02/05. doi: 10.1186/1471-2288-11-15 21291558
Článek vyšel v časopise
PLOS One
2020 Číslo 1
- S diagnostikou Parkinsonovy nemoci může nově pomoci AI nástroj pro hodnocení mrkacího reflexu
- Proč při poslechu některé muziky prostě musíme tančit?
- Je libo čepici místo mozkového implantátu?
- Chůze do schodů pomáhá prodloužit život a vyhnout se srdečním chorobám
- Pomůže v budoucnu s triáží na pohotovostech umělá inteligence?
Nejčtenější v tomto čísle
- Severity of misophonia symptoms is associated with worse cognitive control when exposed to misophonia trigger sounds
- Chemical analysis of snus products from the United States and northern Europe
- Calcium dobesilate reduces VEGF signaling by interfering with heparan sulfate binding site and protects from vascular complications in diabetic mice
- Effect of Lactobacillus acidophilus D2/CSL (CECT 4529) supplementation in drinking water on chicken crop and caeca microbiome
Zvyšte si kvalifikaci online z pohodlí domova
Všechny kurzy