Journals and conferences worry that peer reviews assisted by artificial intelligence (AI), in particular, large language models (LLMs), may negatively influence the validity and fairness of the peer-review system, a cornerstone of modern science. In this work, we address this concern with a quasi-experimental study of the prevalence and impact of AI-assisted peer reviews in the context of the 2024 International Conference on Learning Representations (ICLR), a large and prestigious machine-learning conference. Our contributions are threefold. Firstly, we obtain a lower bound for the prevalence of AI-assisted reviews at ICLR 2024 using the GPTZero LLM detector, estimating that at least $15.8\%$ of reviews were written with AI assistance. Secondly, we estimate the impact of AI-assisted reviews on submission scores. Considering pairs of reviews with different scores assigned to the same paper, we find that in $53.4\%$ of pairs the AI-assisted review scores higher than the human review ($p = 0.002$; relative difference in probability of scoring higher: $+14.4\%$ in favor of AI-assisted reviews). Thirdly, we assess the impact of receiving an AI-assisted peer review on submission acceptance. In a matched study, submissions near the acceptance threshold that received an AI-assisted peer review were $4.9$ percentage points ($p = 0.024$) more likely to be accepted than submissions that did not. Overall, we show that AI-assisted reviews are consequential to the peer-review process and offer a discussion on future implications of current trends
翻译:期刊和会议担忧由人工智能(AI),特别是大语言模型(LLM)辅助的同行评审,可能会对现代科学的基石——同行评审系统的有效性和公平性产生负面影响。本研究针对这一关切,以2024年国际学习表征会议(ICLR)这一大型且享有盛誉的机器学习会议为背景,通过准实验研究探讨AI辅助同行评审的普遍性及其影响。本文贡献有三:首先,利用GPTZero大语言模型检测器,我们获得了ICLR 2024中AI辅助评审普遍性的下界,估计至少有15.8%的评审在AI辅助下完成。其次,我们估算了AI辅助评审对投稿评分的影响。通过考察对同一篇论文给出不同评分的评审对,我们发现53.4%的评审对中,AI辅助评审的评分高于人工评审(p=0.002;评分更高的相对概率差异:AI辅助评审+14.4%)。第三,我们评估了获得AI辅助同行评审对投稿是否被录用的影响。在一项匹配研究中,接近录用阈值的投稿在获得AI辅助同行评审后,其被录用的可能性比未获得AI辅助评审的投稿高出4.9个百分点(p=0.024)。总体而言,我们证明AI辅助评审对同行评审过程具有重要影响,并就当前趋势的未来启示进行了讨论。