Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review

Automated Code Review (ACR) systems integrating Large Language Models (LLMs) are increasingly adopted in software development workflows, ranging from interactive assistants to autonomous agents in CI/CD pipelines. In this paper, we study how LLM-based vulnerability detection in ACR is affected by the framing effect: the tendency to let the presentation of information override its semantic content in forming judgments. We examine whether adversaries can exploit this through contextual-bias injection: crafting PR metadata to bias ACR security judgments as a supply-chain attack vector against real-world ACR pipelines. To this end, we first conduct a large-scale exploratory study across 6 LLMs under five framing conditions, establishing the framing effect as a systematic and widespread phenomenon in LLM-based vulnerability detection, with bug-free framing producing the strongest effect. We then design a realistic and controlled experimental environment, evaluating 17 CVEs across 10 real-world projects, to assess the susceptibility of real-world ACR pipelines to vulnerability reintroduction attacks. We employ two attack strategies: a template-based attack inspired by prior related work, and a novel LLM-assisted iterative refinement attack. We find that template-based attacks are ineffective and may even backfire, as direct biasing attempts raise suspicions. Our iterative refinement attack, on the other hand, achieves 100% success, exploiting a fundamental asymmetry: attackers can iteratively refine attacks against a local clone of the review pipeline, while defenders have only one chance to detect them. Debiasing via metadata redaction and explicit instructions restores detection in all affected cases. Overall, our findings highlight the dangers of over-relying on ACR and stress the importance of human oversight and contributor trust in the development process.

翻译：集成大语言模型（LLM）的自动化代码审查（ACR）系统正越来越多地被应用于软件开发流程中，范围涵盖交互式助手到持续集成/持续部署（CI/CD）流水线中的自主代理。本文研究了基于LLM的漏洞检测在ACR中如何受到框架效应的影响：即允许信息的呈现方式凌驾于其语义内容之上，从而影响判断的倾向。我们考察了攻击者是否能通过上下文偏差注入来利用这一点，即通过精心设计拉取请求（PR）元数据，使ACR的安全判断产生偏差，从而作为一种供应链攻击向量对真实世界的ACR流水线构成威胁。为此，我们首先在六种框架条件下，对六个LLM进行了一项大规模探索性研究，确立了框架效应在基于LLM的漏洞检测中是一种系统性和普遍性现象，其中“无漏洞框架”产生了最强的影响。接着，我们设计了一个现实且受控的实验环境，评估了10个真实世界项目中的17个CVE，以检验真实世界ACR流水线对漏洞重新引入攻击的敏感性。我们采用了两种攻击策略：一种受先前相关工作启发的基于模板的攻击，以及一种新颖的、基于LLM辅助的迭代精化攻击。我们发现基于模板的攻击效果不佳，甚至可能适得其反，因为直接的偏见注入尝试会引发怀疑。相反，我们的迭代精化攻击达到了100%的成功率，利用了一种根本性的不对称性：攻击者可以针对审查流水线的本地克隆进行迭代攻击精化，而防御者只有一次机会进行检测。通过元数据修订和显式指令进行去偏处理，在所有受影响的情况下恢复了检测能力。总体而言，我们的研究结果强调了过度依赖ACR的危险性，并突出了开发过程中人工监督和贡献者信任的重要性。