Predicting the replicability of social science lab experiments
Autoři:
Adam Altmejd aff001; Anna Dreber aff001; Eskil Forsell aff001; Juergen Huber aff003; Taisuke Imai aff004; Magnus Johannesson aff001; Michael Kirchler aff003; Gideon Nave aff005; Colin Camerer aff006
Působiště autorů:
Department of Economics, Stockholm School of Economics, Stockholm, Sweden
aff001; SOFI, Stockholm University, Stockholm, Sweden
aff002; Universität Innsbruck, Innsbruck, Austria
aff003; LMU Munich, Munich, Germany
aff004; The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
aff005; California Institute of Technology, Pasadena, California, United States of America
aff006
Vyšlo v časopise:
PLoS ONE 14(12)
Kategorie:
Research Article
doi:
https://doi.org/10.1371/journal.pone.0225826
Souhrn
We measure how accurately replication of experimental results can be predicted by black-box statistical models. With data from four large-scale replication projects in experimental psychology and economics, and techniques from machine learning, we train predictive models and study which variables drive predictable replication. The models predicts binary replication with a cross-validated accuracy rate of 70% (AUC of 0.77) and estimates of relative effect sizes with a Spearman ρ of 0.38. The accuracy level is similar to market-aggregated beliefs of peer scientists [1, 2]. The predictive power is validated in a pre-registered out of sample test of the outcome of [3], where 71% (AUC of 0.73) of replications are predicted correctly and effect size correlations amount to ρ = 0.25. Basic features such as the sample and effect sizes in original papers, and whether reported effects are single-variable main effects or two-variable interactions, are predictive of successful replication. The models presented in this paper are simple tools to produce cheap, prognostic replicability metrics. These models could be useful in institutionalizing the process of evaluation of new findings and guiding resources to those direct replications that are likely to be most informative.
Klíčová slova:
Algorithms – Experimental economics – Machine learning – Machine learning algorithms – Replication studies – Scientists – Experimental psychology
Zdroje
1. Dreber A, Pfeiffer T, Almenberg J, Isaksson S, Wilson B, Chen Y, et al. Using Prediction Markets to Estimate the Reproducibility of Scientific Research. Proceedings of the National Academy of Sciences. 2015;112(50):15343–15347. doi: 10.1073/pnas.1516179112
2. Camerer CF, Dreber A, Forsell E, Ho TH, Huber J, Johannesson M, et al. Evaluating Replicability of Laboratory Experiments in Economics. Science. 2016;351(6280):1433–1436. doi: 10.1126/science.aaf0918 26940865
3. Camerer CF, Dreber A, Holzmeister F, Ho TH, Huber J, Johannesson M, et al. Evaluating the Replicability of Social Science Experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour. 2018;2(9):637–644. doi: 10.1038/s41562-018-0399-z 31346273
4. Simonsohn U, Nelson LD, Simmons JP. P-Curve: A Key to the File-Drawer. Journal of Experimental Psychology: General. 2014;143(2):534–547. doi: 10.1037/a0033242
5. Simmons JP, Nelson LD, Simonsohn U. False-Positive Psychology Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science. 2011;22(11):1359–1366. doi: 10.1177/0956797611417632 22006061
6. Koch C, Jones A. Big Science, Team Science, and Open Science for Neuroscience. Neuron. 2016;92(3):612–616. doi: 10.1016/j.neuron.2016.10.019 27810003
7. Open Science Collaboration. Estimating the Reproducibility of Psychological Science. Science. 2015;349 (6251).
8. Bavel JJV, Mende-Siedlecki P, Brady WJ, Reinero DA. Contextual Sensitivity in Scientific Reproducibility. Proceedings of the National Academy of Sciences. 2016;113(23):6454–6459. doi: 10.1073/pnas.1521897113
9. Ioannidis JPA. Why Most Published Research Findings Are False. PLOS Medicine. 2005;2(8):e124. doi: 10.1371/journal.pmed.0020124 16060722
10. Lindsay DS. Replication in Psychological Science. Psychological Science. 2015;26(12):1827–1832. doi: 10.1177/0956797615616374 26553013
11. Ioannidis JPA, Munafò MR, Fusar-Poli P, Nosek BA, David SP. Publication and Other Reporting Biases in Cognitive Sciences: Detection, Prevalence, and Prevention. Trends in Cognitive Sciences. 2014;18(5):235–241. doi: 10.1016/j.tics.2014.02.010 24656991
12. Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, et al. Promoting an Open Research Culture. Science. 2015;348(6242):1422–1425. doi: 10.1126/science.aab2374 26113702
13. Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG. Replication Validity of Genetic Association Studies. Nature Genetics. 2001;29(3):306–309. doi: 10.1038/ng749 11600885
14. Martinson BC, Anderson MS, de Vries R. Scientists Behaving Badly. Nature. 2005;435:737–738. doi: 10.1038/435737a 15944677
15. Silberzahn R, Uhlmann EL, Martin DP, Anselmi P, Aust F, Awtrey E, et al. Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results. Advances in Methods and Practices in Psychological Science. 2018;1(3):337–356. doi: 10.1177/2515245917747646
16. De Vries R, Anderson MS, Martinson BC. Normal Misbehavior: Scientists Talk about the Ethics of Research. Journal of Empirical Research on Human Research Ethics. 2006;1(1):43–50. doi: 10.1525/jer.2006.1.1.43 16810336
17. Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Percie du Sert N, et al. A Manifesto for Reproducible Science. Nature Human Behaviour. 2017;1(1):0021. doi: 10.1038/s41562-016-0021
18. O’Boyle EH, Banks GC, Gonzalez-Mulé E. The Chrysalis Effect: How Ugly Initial Results Metamorphosize Into Beautiful Articles. Journal of Management. 2017;43(2):376–399.
19. Begley C Glenn, Ioannidis John P A. Reproducibility in Science. Circulation Research. 2015;116(1):116–126.
20. Ioannidis JPA, Tarone R, McLaughlin JK. The False-Positive to False-Negative Ratio in Epidemiologic Studies. Epidemiology. 2011;22(4):450–456. doi: 10.1097/EDE.0b013e31821b506e 21490505
21. Simons DJ. The Value of Direct Replication. Perspectives on Psychological Science. 2014;9(1):76–80. doi: 10.1177/1745691613514755 26173243
22. Rand DG, Greene JD, Nowak MA. Spontaneous Giving and Calculated Greed. Nature. 2012;489(7416):427–430. doi: 10.1038/nature11467 22996558
23. Tinghög G, Andersson D, Bonn C, Böttiger H, Josephson C, Lundgren G, et al. Intuition and Cooperation Reconsidered. Nature. 2013;498(7452):E1–E2. doi: 10.1038/nature12194 23739429
24. Bouwmeester S, Verkoeijen PPJL, Aczel B, Barbosa F, Bègue L, Brañas-Garza P, et al. Registered Replication Report: Rand, Greene, and Nowak (2012). Perspectives on Psychological Science. 2017;12(3):527–542. doi: 10.1177/1745691617693624 28475467
25. Rand DG, Greene JD, Nowak MA. Rand et al. Reply. Nature. 2013;498(7452):E2–E3. doi: 10.1038/nature12195
26. Rand DG. Reflections on the Time-Pressure Cooperation Registered Replication Report. Perspectives on Psychological Science. 2017;12(3):543–547. doi: 10.1177/1745691617693625 28544864
27. Nuijten MB, Hartgerink CHJ, van Assen MALM, Epskamp S, Wicherts JM. The Prevalence of Statistical Reporting Errors in Psychology (1985–2013). Behavior Research Methods. 2016;48(4):1205–1226. doi: 10.3758/s13428-015-0664-2 26497820
28. Klein RA, Ratliff KA, Vianello M, Adams RB, Bahník Š, Bernstein MJ, et al. Investigating Variation in Replicability: A “Many Labs” Replication Project. Social Psychology. 2014;45(3):142–152. doi: 10.1027/1864-9335/a000178
29. Ebersole CR, Atherton OE, Belanger AL, Skulborstad HM, Allen JM, Banks JB, et al. Many Labs 3: Evaluating Participant Pool Quality across the Academic Semester via Replication. Journal of Experimental Social Psychology. 2016;67:68–82. doi: 10.1016/j.jesp.2015.10.012
30. Yarkoni T, Westfall J. Choosing Prediction over Explanation in Psychology: Lessons from Machine Learning. Perspectives in Psychological Science. 2017;12(6):1100–1122. doi: 10.1177/1745691617693393
31. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics. Springer; 2009.
32. Nave G, Minxha J, Greenberg DM, Kosinski M, Stillwell D, Rentfrow J. Musical Preferences Predict Personality: Evidence From Active Listening and Facebook Likes. Psychological Science. 2018;29(7):1145–1158. doi: 10.1177/0956797618761659 29587129
33. Camerer CF, Nave G, Smith A. Dynamic Unstructured Bargaining with Private Information: Theory, Experiment, and Outcome Prediction via Machine Learning. Management Science. 2018;65(4):1867–1890. doi: 10.1287/mnsc.2017.2965
34. Wolfers J, Zitzewitz E. Interpreting Prediction Market Prices as Probabilities. National Bureau of Economic Research; 2006. 12200.
35. Simonsohn U. Small Telescopes Detectability and the Evaluation of Replication Results. Psychological Science. 2015;26(5):559–569. doi: 10.1177/0956797614567341 25800521
36. Kasy M, Andrews I. Identification of and Correction for Publication Bias. American Economic Review. 2019;109(8):2766–2294. doi: 10.1257/aer.20180310
37. Bradley AP. The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition. 1997;30(7):1145–1159. doi: 10.1016/S0031-3203(96)00142-2
38. Breiman L. Random Forests. Machine Learning. 2001;45(1):5–32. doi: 10.1023/A:1010933404324
39. Forsell E, Viganola D, Pfeiffer T, Almenberg J, Wilson B, Chen Y, et al. Predicting Replication Outcomes in the Many Labs 2 Study. Journal of Economic Psychology. 2018. doi: 10.1016/j.joep.2018.10.009
40. Inbar Y. Association between Contextual Dependence and Replicability in Psychology May Be Spurious. Proceedings of the National Academy of Sciences. 2016;113(34):E4933–E4934. doi: 10.1073/pnas.1608676113
41. Altmejd A. Registration of Predictions; 2017. https://osf.io/w2y96.
42. Gelman A, Carlin J. Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspectives on Psychological Science. 2014;9(6):641–651. doi: 10.1177/1745691614551642 26186114
43. Meehl PE. Clinical Versus Statistical Prediction: A Theoretical Analysis and a Review of the Evidence. Minneapolis, MN, US: University of Minnesota Press; 1954.
44. Dawes RM. The Robust Beauty of Improper Linear Models in Decision Making. American Psychologist. 1979;34(7):571–582. doi: 10.1037/0003-066X.34.7.571
45. Bishop MA, Trout JD. Epistemology and the Psychology of Human Judgment. Oxford University Press; 2004.
46. Youyou W, Kosinski M, Stillwell D. Computer-Based Personality Judgments Are More Accurate than Those Made by Humans. Proceedings of the National Academy of Sciences. 2015;112(4):1036–1040. doi: 10.1073/pnas.1418680112
47. Kleinberg J, Lakkaraju H, Leskovec J, Ludwig J, Mullainathan S. Human Decisions and Machine Predictions. The Quarterly Journal of Economics. 2017;133(1):237–293. doi: 10.1093/qje/qjx032 29755141
48. Masnadi-Shirazi H, Vasconcelos N. Asymmetric Boosting. In: Proceedings of the 24th International Conference on Machine Learning. ICML’07. New York, NY, USA: ACM; 2007. p. 609–619.
49. Campbell DT. Assessing the Impact of Planned Social Change. Evaluation and Program Planning. 1979;2(1):67–90. doi: 10.1016/0149-7189(79)90048-X
50. Kleinberg J, Mullainathan S, Raghavan M. Inherent Trade-Offs in the Fair Determination of Risk Scores. arXiv:160905807. 2016;.
51. Meng XL. Statistical Paradises and Paradoxes in Big Data (I): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election. The Annals of Applied Statistics. 2018;12(2):685–726. doi: 10.1214/18-AOAS1161SF
52. Simons DJ, Holcombe AO, Spellman BA. An Introduction to Registered Replication Reports at Perspectives on Psychological Science. Perspectives on Psychological Science. 2014;9(5):552–555. doi: 10.1177/1745691614543974 26186757
Článek vyšel v časopise
PLOS One
2019 Číslo 12
- S diagnostikou Parkinsonovy nemoci může nově pomoci AI nástroj pro hodnocení mrkacího reflexu
- Je libo čepici místo mozkového implantátu?
- Pomůže v budoucnu s triáží na pohotovostech umělá inteligence?
- AI může chirurgům poskytnout cenná data i zpětnou vazbu v reálném čase
- Nová metoda odlišení nádorové tkáně může zpřesnit resekci glioblastomů
Nejčtenější v tomto čísle
- Methylsulfonylmethane increases osteogenesis and regulates the mineralization of the matrix by transglutaminase 2 in SHED cells
- Oregano powder reduces Streptococcus and increases SCFA concentration in a mixed bacterial culture assay
- The characteristic of patulous eustachian tube patients diagnosed by the JOS diagnostic criteria
- Parametric CAD modeling for open source scientific hardware: Comparing OpenSCAD and FreeCAD Python scripts
Zvyšte si kvalifikaci online z pohodlí domova
Všechny kurzy