精英同行评审中的复合欺骗：NeurIPS 2025 百例伪造引文的失效模式分类研究 (Compound Deception in Elite Peer Review: A Failure Mode Taxonomy of 100 Fabricated Citations at NeurIPS 2025)

Large language models (LLMs) are increasingly used in academic writing workflows, yet they frequently hallucinate by generating citations to sources that do not exist. This study analyzes 100 AI-generated hallucinated citations that appeared in papers accepted by the 2025 Conference on Neural Information Processing Systems (NeurIPS), one of the world's most prestigious AI conferences. Despite review by 3-5 expert researchers per paper, these fabricated citations evaded detection, appearing in 53 published papers (approx. 1% of all accepted papers). We develop a five-category taxonomy that classifies hallucinations by their failure mode: Total Fabrication (66%), Partial Attribute Corruption (27%), Identifier Hijacking (4%), Placeholder Hallucination (2%), and Semantic Hallucination (1%). Our analysis reveals a critical finding: every hallucination (100%) exhibited compound failure modes. The distribution of secondary characteristics was dominated by Semantic Hallucination (63%) and Identifier Hijacking (29%), which often appeared alongside Total Fabrication to create a veneer of plausibility and false verifiability. These compound structures exploit multiple verification heuristics simultaneously, explaining why peer review fails to detect them. The distribution exhibits a bimodal pattern: 92% of contaminated papers contain 1-2 hallucinations (minimal AI use) while 8% contain 4-13 hallucinations (heavy reliance). These findings demonstrate that current peer review processes do not include effective citation verification and that the problem extends beyond NeurIPS to other major conferences, government reports, and professional consulting. We propose mandatory automated citation verification at submission as an implementable solution to prevent fabricated citations from becoming normalized in scientific literature.

翻译：大型语言模型（LLMs）在学术写作流程中的应用日益增多，但其经常产生幻觉，生成指向不存在的文献的引文。本研究分析了出现在2025年神经信息处理系统大会（NeurIPS）录用论文中的100个由AI生成的幻觉引文。NeurIPS是全球最负盛名的AI会议之一。尽管每篇论文都经过了3-5位专家研究者的评审，这些伪造的引文仍未被发现，出现在53篇已发表的论文中（约占所有录用论文的1%）。我们提出了一个五类分类法，根据其失效模式对幻觉进行分类：完全捏造（66%）、部分属性篡改（27%）、标识符劫持（4%）、占位符幻觉（2%）和语义幻觉（1%）。我们的分析揭示了一个关键发现：每一个幻觉（100%）都表现出复合失效模式。次要特征的分布以语义幻觉（63%）和标识符劫持（29%）为主，它们常与完全捏造同时出现，以营造一种表面上的合理性和虚假的可验证性。这些复合结构同时利用了多种验证启发式方法，这解释了为何同行评审未能将其检测出来。其分布呈现出双峰模式：92%的受影响论文包含1-2个幻觉（AI使用极少），而8%的论文包含4-13个幻觉（高度依赖AI）。这些发现表明，当前的同行评审流程未包含有效的引文验证，且该问题不仅限于NeurIPS，也延伸至其他主要会议、政府报告和专业咨询领域。我们建议在投稿时强制进行自动化的引文验证，作为一种可实施的解决方案，以防止伪造引文在科学文献中被常态化。