基于语言模型的自动化程序修复中对抗性错误报告的安全风险 (Adversarial Bug Reports as a Security Risk in Language Model-Based Automated Program Repair)

Large Language Model (LLM) - based Automated Program Repair (APR) systems are increasingly integrated into modern software development workflows, offering automated patches in response to natural language bug reports. However, this reliance on untrusted user input introduces a novel and underexplored attack surface. In this paper, we investigate the security risks posed by adversarial bug reports -- realistic-looking issue submissions crafted to mislead APR systems into producing insecure or harmful code changes. We develop a comprehensive threat model and conduct an empirical study to evaluate the vulnerability of APR systems to such attacks. Our demonstration comprises 51 adversarial bug reports generated across a spectrum of strategies, ranging from manual curation to fully automated pipelines. We test these against a leading LLM-based APR system and assess both pre-repair defenses (e.g., LlamaGuard variants, PromptGuard variants, Granite-Guardian, and custom LLM filters) and post-repair detectors (GitHub Copilot, CodeQL). Our findings show that current defenses are insufficient: 90% of crafted bug reports triggered attacker-aligned patches. The best pre-repair filter blocked only 47%, while post-repair analysis -- often requiring human oversight -- was effective in just 58% of cases. To support scalable security testing, we introduce a prototype framework for automating the generation of adversarial bug reports. Our analysis exposes a structural asymmetry: generating adversarial inputs is inexpensive, while detecting or mitigating them remains costly and error-prone. We conclude with recommendations for improving the robustness of APR systems against adversarial misuse and highlight directions for future work on secure APR.

翻译：基于大型语言模型（LLM）的自动化程序修复（APR）系统正日益融入现代软件开发流程，能够根据自然语言错误报告自动生成补丁。然而，这种对不可信用户输入的依赖引入了一个新颖且尚未被充分探索的攻击面。本文研究了对抗性错误报告带来的安全风险——这些看似真实的缺陷提交经过精心构造，旨在误导APR系统产生不安全或有害的代码变更。我们建立了完整的威胁模型，并通过实证研究评估APR系统对此类攻击的脆弱性。我们的演示包含51个采用不同策略生成的对抗性错误报告，涵盖从人工编制到全自动化流程的多种方法。我们针对领先的基于LLM的APR系统进行测试，并评估了修复前防御机制（如LlamaGuard变体、PromptGuard变体、Granite-Guardian及定制LLM过滤器）与修复后检测器（GitHub Copilot、CodeQL）的有效性。研究结果表明现有防御措施存在不足：90%的构造错误报告成功触发了攻击者预期的补丁。最优的修复前过滤器仅能拦截47%的攻击，而通常需要人工监督的修复后分析仅在58%的情况下有效。为支持可扩展的安全测试，我们提出了一个用于自动化生成对抗性错误报告的原型框架。我们的分析揭示了一种结构性不对称：生成对抗性输入成本低廉，而检测或缓解这些输入则成本高昂且容易出错。最后，我们提出了增强APR系统对抗性滥用的鲁棒性建议，并指明了安全APR未来研究的方向。