In many fields of experimental science, papers that failed to replicate continue to be cited as a result of the poor discoverability of replication studies. As a first step to creating a system that automatically finds replication studies for a given paper, 334 replication studies and 344 replicated studies were collected. Replication studies could be identified in the dataset based on text content at a higher rate than chance (AUROC = 0.886). Additionally, successful replication studies could be distinguished from failed replication studies at a higher rate than chance (AUROC = 0.664).
翻译:在实验科学的许多领域中,由于重复研究的可发现性较差,未能被重复的论文仍持续被引用。作为构建自动查找特定论文重复研究系统的第一步,我们收集了334项重复研究和344项被重复研究。基于文本内容,数据集中重复研究的识别率高于随机水平(AUROC = 0.886)。此外,成功重复研究与失败重复研究的区分率亦高于随机水平(AUROC = 0.664)。