Despite recent advances, Large Language Models (LLMs) still generate vulnerable code. Retrieval-Augmented Generation (RAG) has the potential to enhance LLMs for secure code generation by incorporating external security knowledge. However, the conventional RAG design struggles with the noise of raw security-related documents, and existing retrieval methods overlook the significant security semantics implicitly embedded in task descriptions. To address these issues, we propose \textsc{Rescue}, a new RAG framework for secure code generation with two key innovations. First, we propose a hybrid knowledge base construction method that combines LLM-assisted cluster-then-summarize distillation with program slicing, producing both high-level security guidelines and concise, security-focused code examples. Second, we design a hierarchical multi-faceted retrieval that traverses the constructed knowledge base from top to bottom and integrates multiple security-critical facts at each hierarchical level, ensuring comprehensive and accurate retrieval. We evaluated \textsc{Rescue} on four benchmarks and compared it with five state-of-the-art secure code generation methods on six LLMs. The results demonstrate that \textsc{Rescue} improves the SecurePass@1 metric by an average of 4.8 points, establishing a new state-of-the-art performance for security. Furthermore, we performed in-depth analysis and ablation studies to rigorously validate the effectiveness of individual components in \textsc{Rescue}. Our code is available at https://github.com/steven1518/RESCUE.
翻译:尽管近期取得了进展,但大型语言模型(LLM)生成的代码仍存在安全漏洞。检索增强生成(RAG)技术通过引入外部安全知识,有望提升LLM生成安全代码的能力。然而,传统RAG设计难以处理原始安全相关文档中的噪声,且现有检索方法忽视了任务描述中隐含的重要安全语义。为解决这些问题,我们提出了\textsc{Rescue}——一个用于安全代码生成的新型RAG框架,其包含两项关键创新。首先,我们提出一种混合知识库构建方法,该方法结合了LLM辅助的“聚类-摘要”蒸馏技术与程序切片技术,同时生成高层级安全指南与简洁、聚焦安全的代码示例。其次,我们设计了一种分层多维度检索机制,该机制自上而下遍历所构建的知识库,并在每个层级整合多个安全关键事实,从而确保检索的全面性与准确性。我们在四个基准测试上评估了\textsc{Rescue},并在六个LLM上将其与五种最先进的安全代码生成方法进行比较。结果表明,\textsc{Rescue}将SecurePass@1指标平均提升了4.8个百分点,确立了安全性能的最新标杆。此外,我们进行了深入分析与消融研究,以严格验证\textsc{Rescue}中各个组件的有效性。我们的代码发布于https://github.com/steven1518/RESCUE。