当拒绝转为接受：量化基于LLM的科学审稿人对间接提示注入的脆弱性 (When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection)

Driven by surging submission volumes, scientific peer review has catalyzed two parallel trends: individual over-reliance on LLMs and institutional AI-powered assessment systems. This study investigates the robustness of "LLM-as-a-Judge" systems to adversarial PDF manipulation via invisible text injections and layout aware encoding attacks. We specifically target the distinct incentive of flipping "Reject" decisions to "Accept," a vulnerability that fundamentally compromises scientific integrity. To measure this, we introduce the Weighted Adversarial Vulnerability Score (WAVS), a novel metric that quantifies susceptibility by weighting score inflation against the severity of decision shifts relative to ground truth. We adapt 15 domain-specific attack strategies, ranging from semantic persuasion to cognitive obfuscation, and evaluate them across 13 diverse language models (including GPT-5 and DeepSeek) using a curated dataset of 200 official and real-world accepted and rejected submissions (e.g., ICLR OpenReview). Our results demonstrate that obfuscation techniques like "Maximum Mark Magyk" and "Symbolic Masking & Context Redirection" successfully manipulate scores, achieving decision flip rates of up to 86.26% in open-source models, while exposing distinct "reasoning traps" in proprietary systems. We release our complete dataset and injection framework to facilitate further research on the topic (https://anonymous.4open.sciencer/llm-jailbreak-FC9E/).

翻译：在投稿量激增的驱动下，科学同行评审催生了两个并行趋势：个体对大型语言模型（LLM）的过度依赖和机构采用AI驱动的评估系统。本研究调查了“LLM即评审员”系统在面对通过隐形文本注入和布局感知编码攻击进行的对抗性PDF篡改时的鲁棒性。我们特别关注将“拒绝”决定翻转为“接受”这一独特动机，该漏洞从根本上损害了科学诚信。为量化此问题，我们引入了加权对抗脆弱性评分（WAVS），这是一种新颖的度量标准，通过根据决策偏移相对于真实情况的严重程度对分数膨胀进行加权，来量化系统的易受攻击性。我们调整了15种领域特定的攻击策略（从语义说服到认知混淆），并使用包含200份官方及真实世界已接收与拒稿提交（例如ICLR OpenReview）的精选数据集，在13种不同的语言模型（包括GPT-5和DeepSeek）上进行了评估。我们的结果表明，诸如“最大标记魔法”和“符号掩蔽与上下文重定向”等混淆技术能够成功操纵评分，在开源模型中实现了高达86.26%的决策翻转率，同时揭示了专有系统中独特的“推理陷阱”。我们发布了完整的数据集和注入框架，以促进该主题的进一步研究（https://anonymous.4open.sciencer/llm-jailbreak-FC9E/）。