Estimating the deep replicability of scientific findings using human and artificial intelligence

Top Cited Papers

4 May 2020

journal article
research article
Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences of the United States of America

Vol. 117 (20), 10762-10768
https://doi.org/10.1073/pnas.1909046117

Abstract

Replicability tests of scientific papers show that the majority of papers fail replication. Moreover, failed papers circulate through the literature as quickly as replicating papers. This dynamic weakens the literature, raises research costs, and demonstrates the need for new approaches for estimating a study’s replicability. Here, we trained an artificial intelligence model to estimate a paper’s replicability using ground truth data on studies that had passed or failed manual replication tests, and then tested the model’s generalizability on an extensive set of out-of-sample studies. The model predicts replicability better than the base rate of reviewers and comparably as well as prediction markets, the best present-day method for predicting replicability. In out-of-sample tests on manually replicated papers from diverse disciplines and methods, the model had strong accuracy levels of 0.65 to 0.78. Exploring the reasons behind the model’s predictions, we found no evidence for bias based on topics, journals, disciplines, base rates of failure, persuasion words, or novelty words like “remarkable” or “unexpected.” We did find that the model’s accuracy is higher when trained on a paper’s text rather than its reported statistics and that n-grams, higher order word combinations that humans have difficulty processing, correlate with replication. We discuss how combining human and machine intelligence can raise confidence in research, provide research self-assessment techniques, and create methods that are scalable and efficient enough to review the ever-growing numbers of publications—a task that entails extensive human resources to accomplish with prediction markets and manual replication alone.

Funding Information

DOD | United States Army | RDECOM | Army Research Office (W911NF15-1-0577)
DOD | USAF | AFMC | Air Force Office of Scientific Research (FA9550-19-1-0354)

This publication has 46 references indexed in Scilit:

Bias in peer review
Journal of the American Society for Information Science and Technology, 2012
Raise standards for preclinical cancer research
Nature, 2012
Believe it or not: how much can we rely on published data on potential drug targets?
Nature Reviews Drug Discovery, 2011
Computational Methods to Extract Meaning From Text and Advance Theories of Human Cognition
Topics in Cognitive Science, 2010
Scientific Research and the Public Trust
Science and Engineering Ethics, 2010
A Cautionary Note on the Use of the Kolmogorov–Smirnov Test for Normality
Monthly Weather Review, 2007
Contradicted and Initially Stronger Effects in Highly Cited Clinical Research
JAMA, 2005
Gender bias in the refereeing process?
Trends in Ecology & Evolution, 2002
A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge.
Psychological Review, 1997
Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes
The Annals of Mathematical Statistics, 1952

Cited by 40 articles