Computational reproducibility is central to scientific credibility, yet verifying published results at scale remains costly. We develop an AI-assisted workflow for automated full-paper replication -- retrieving materials, reconstructing environments, executing code, and matching outputs to point estimates reported in regression tables. We define a universe of all empirical and quantitative papers from the three top political science journals (2010--2025) and measure stated data availability using automated extraction. For a stratified sample of 384 studies, we apply the workflow to conduct full-paper replication, totaling 3,523 empirical models. We find that journal verification requirements, combined with data archiving mandates, drive reproducibility: the share of fully or largely reproducible papers rises from 20.8% before DA-RT adoption to 82.5% after, and conditional on accessible replication packages, 92.1% of papers are fully or largely reproducible (234/254). As a secondary application, we apply standardized IV diagnostics to 84 studies (597 IV specifications among 1,910 replicated models), illustrating how automated execution enables systematic reanalysis across heterogeneous empirical settings.
翻译:计算可重复性是科学可信度的核心,然而大规模验证已发表结果仍成本高昂。我们开发了一种AI辅助工作流,用于自动完成论文全篇复制——包括检索材料、重建环境、执行代码,并将输出与回归表中报告的点估计值进行匹配。我们以三本顶级政治学期刊(2010—2025年)的所有实证与定量论文为对象域,通过自动化提取方法测度其声明的数据可用性。针对384项研究的分层样本,我们运用该工作流开展全篇复制,共计涉及3,523个实证模型。研究发现,期刊核查要求与数据归档强制规定共同推动了可重复性提升:在DA-RT政策采纳前,完全或大体可复制的论文占比为20.8%,而此后该比例升至82.5%;若限定于可获取复制材料的情形,这一比例高达92.1%(234/254)。作为辅助应用,我们对84项研究(在1,910个复制模型中包含597个IV设定)实施标准化工具变量诊断,展示了自动化执行如何实现跨异质性实证场景的系统性再分析。