Gaming AI-Assisted Peer Reviews Poses New Risks to the Scientific Community

AI is increasingly used to support scientific peer review, from manuscript screening, reviewer assistance to editorial triage. Although such systems promise to reduce reviewer burden and accelerate publication, their robustness to strategic manipulation remains poorly understood. Here we show that AI-mediated peer review is vulnerable to a simple, low-cost manipulation: superficial rephrasing of the manuscript abstract. Without changing the underlying scientific content and communication, and even without knowledge of the reviewing model, adversarially rewritten abstracts substantially improve AI review outcomes. We see this across disciplines and publication venues, for both human-written and AI-generated papers. Our strongest attack achieves an attack-success-rate of about 38%, increasing acceptance ratings by +1.31 for Gemini 3 Flash reviewers and by +0.88 for GPT 5.4 Mini reviewers on a 10-point scale. When the original AI review suggests 'reject', the success rate rises to more than 50%. This effect extends beyond overall score inflation, increasing review confidence and scores on core scientific criteria such as soundness, significance and perceived contribution. The attack is practical, requiring only about 5 minutes and $1 for a 10-page AI conference submission, and is hard to distinguish from ordinary scientific editing. Inflated AI reviews could bias downstream human decision-making, shifting editorial recommendations from rejection towards acceptance. These findings reveal a general vulnerability in AI-assisted scientific evaluation: when AI-generated review influence editorial decisions, authors may be incentivized to optimize manuscripts for AI judgment rather than scientific merit. Our results suggest that AI tools should not be treated as neutral evaluators in high-stakes peer review without systematic robustness testing, transparent safeguards and careful human oversight.

翻译：人工智能正日益被用于支持科学同行评审，涵盖从稿件筛选、评审辅助到编辑分类等环节。尽管这类系统有望减轻评审负担并加速出版进程，但其对策略性操纵的鲁棒性仍鲜为人知。本研究揭示，AI中介的同行评审极易受到一种简单、低成本的操纵：对稿件摘要进行表面性的措辞改写。在不改变核心科学内容与表达方式、甚至不了解评审模型的情况下，对抗性重写的摘要显著提升了AI评审结果。这一现象跨越不同学科与出版渠道，同时适用于人工撰写与AI生成的论文。我们最强的攻击实现了约38%的攻击成功率，将Gemini 3 Flash评审员的接受评分提升1.31分，GPT 5.4 Mini评审员的评分提升0.88分（基于10分制）。当原始AI评审结果为“拒稿”时，成功率更升至50%以上。该效应不仅限于总分膨胀，还提升了评审信心及对科学性、重要性与感知贡献等核心科学指标的评分。该攻击具有实操性，针对10页的AI会议投稿仅需约5分钟、1美元成本，且难以与常规科学编辑区分。膨胀后的AI评审可能扭曲下游人类决策，将编辑建议从“拒稿”导向“接收”。这些发现揭示了AI辅助科学评估中的普遍脆弱性：当AI生成的评审影响编辑决策时，作者可能受到激励去优化稿件以适应AI评判标准，而非追求科学实质。我们的结果表明，在高风险的同行评审中，不应将AI工具视为中性评估者，除非经过系统的鲁棒性测试、设置透明防护措施并实施严谨的人类监督。

相关内容

关注 7111

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

利用量规奖励训练 AI 共同科学家

专知会员服务

19+阅读 · 1月5日

【AI4Science】利用大型语言模型变革科学：关于人工智能辅助科学发现、实验、内容生成与评估的调研

专知会员服务

34+阅读 · 2025年2月10日

《人工智能辅助决策面临的三大挑战》最新33页

专知会员服务

53+阅读 · 2025年1月8日

AI智能体面临的威胁：关键安全挑战与未来路径综述

专知会员服务

53+阅读 · 2024年6月7日