The convergence of LLM-powered research assistants and AI-based peer review systems creates a critical vulnerability: fully automated publication loops where AI-generated research is evaluated by AI reviewers without human oversight. We investigate this through \textbf{BadScientist}, a framework that evaluates whether fabrication-oriented paper generation agents can deceive multi-model LLM review systems. Our generator employs presentation-manipulation strategies requiring no real experiments. We develop a rigorous evaluation framework with formal error guarantees (concentration bounds and calibration analysis), calibrated on real data. Our results reveal systematic vulnerabilities: fabricated papers achieve acceptance rates up to . Critically, we identify \textit{concern-acceptance conflict} -- reviewers frequently flag integrity issues yet assign acceptance-level scores. Our mitigation strategies show only marginal improvements, with detection accuracy barely exceeding random chance. Despite provably sound aggregation mathematics, integrity checking systematically fails, exposing fundamental limitations in current AI-driven review systems and underscoring the urgent need for defense-in-depth safeguards in scientific publishing.
翻译:大型语言模型驱动的研究助手与基于人工智能的同行评审系统的融合,揭示了一个关键漏洞:即可能形成完全自动化的出版循环——由AI生成的研究在缺乏人工监督的情况下由AI审稿人进行评估。我们通过\textbf{BadScientist}框架对此进行研究,该框架旨在评估以捏造为导向的论文生成智能体是否能够欺骗多模型LLM评审系统。我们的生成器采用无需真实实验的呈现操纵策略。我们开发了一个具有形式化误差保证(集中界限与校准分析)的严格评估框架,并在真实数据上进行了校准。我们的结果揭示了系统性漏洞:捏造的论文获得高达的接受率。关键的是,我们发现了\textit{担忧-接受冲突}现象——审稿人频繁标记诚信问题,却仍给出达到接受水平的评分。我们提出的缓解策略仅带来边际改善,检测准确率勉强超过随机猜测水平。尽管聚合数学在理论上可靠,但诚信检查系统性地失效,这暴露了当前AI驱动评审系统的根本局限,并突显了在科学出版中建立深度防御保障机制的迫切需求。