The convergence of LLM-powered research assistants and AI-based peer review systems creates a critical vulnerability: fully automated publication loops where AI-generated research is evaluated by AI reviewers without human oversight. We investigate this through \textbf{BadScientist}, a framework that evaluates whether fabrication-oriented paper generation agents can deceive multi-model LLM review systems. Our generator employs presentation-manipulation strategies requiring no real experiments. We develop a rigorous evaluation framework with formal error guarantees (concentration bounds and calibration analysis), calibrated on real data. Our results reveal systematic vulnerabilities: fabricated papers achieve acceptance rates up to . Critically, we identify \textit{concern-acceptance conflict} -- reviewers frequently flag integrity issues yet assign acceptance-level scores. Our mitigation strategies show only marginal improvements, with detection accuracy barely exceeding random chance. Despite provably sound aggregation mathematics, integrity checking systematically fails, exposing fundamental limitations in current AI-driven review systems and underscoring the urgent need for defense-in-depth safeguards in scientific publishing.
翻译:基于大语言模型的研究助手与基于人工智能的同行评议系统的融合催生了一个关键漏洞:完全自动化的出版循环,即由人工智能生成的科研论文在无人监督的情况下接受人工智能审稿人评估。我们通过**BadScientist**框架对此进行研究,该框架评估以虚构为导向的论文生成智能体能否欺骗多模型大语言模型审稿系统。我们的生成器采用无需真实实验的表现操控策略。我们开发了一个严谨的评估框架,包含形式化的误差保证(浓度界与校准分析),并基于真实数据完成校准。研究结果揭示了系统性漏洞:虚构论文的接收率最高可达。关键的是,我们识别出“担忧-接收矛盾”——审稿人频繁标记诚信问题却给出接收级别的评分。我们的缓解策略仅带来边际改进,检测准确率几乎未超过随机猜测水平。尽管聚合数学原理在理论上可证明其正确性,但诚信检查系统性地失效,暴露了当前人工智能驱动审稿系统的根本局限性,并凸显在科学出版中部署纵深防御保障措施的迫切需求。