QRS: A Rule-Synthesizing Neuro-Symbolic Triad for Autonomous Vulnerability Discovery

Static Application Security Testing (SAST) tools are integral to modern DevSecOps pipelines, yet tools like CodeQL, Semgrep, and SonarQube remain fundamentally constrained: they require expert-crafted queries, generate excessive false positives, and detect only predefined vulnerability patterns. Recent work has explored augmenting SAST with Large Language Models (LLMs), but these approaches typically use LLMs to triage existing tool outputs rather than to reason about vulnerability semantics directly. We introduce QRS (Query, Review, Sanitize), a neuro-symbolic framework that inverts this paradigm. Rather than filtering results from static rules, QRS employs three autonomous agents that generate CodeQL queries from a structured schema definition and few-shot examples, then validate findings through semantic reasoning and automated exploit synthesis. This architecture enables QRS to discover vulnerability classes beyond predefined patterns while substantially reducing false positives. We evaluate QRS on full Python packages rather than isolated snippets. In 20 historical CVEs in popular PyPI libraries, QRS achieves 90.6% detection accuracy. Applied to the 100 most-downloaded PyPI packages, QRS identified 39 medium-to-high-severity vulnerabilities, 5 of which were assigned new CVEs, 5 received documentation updates, while the remaining 29 were independently discovered by concurrent researchers, validating both the severity and discoverability of these findings. QRS accomplishes this with low time overhead and manageable token costs, demonstrating that LLM-driven query synthesis and code review can complement manually curated rule sets and uncover vulnerability patterns that evade existing industry tools.

翻译：静态应用程序安全测试（SAST）工具是现代DevSecOps流程的核心组成部分，然而CodeQL、Semgrep和SonarQube等工具仍存在根本性局限：它们需要专家编写查询、产生大量误报，且仅能检测预定义的漏洞模式。近期研究探索了使用大语言模型（LLM）增强SAST，但这些方法通常仅用LLM对现有工具输出进行分级处理，而非直接对漏洞语义进行推理。我们提出了QRS（查询、审查、净化），这是一个颠覆该范式的神经符号框架。QRS并非通过静态规则过滤结果，而是部署三个自主智能体：它们从结构化模式定义和少量示例生成CodeQL查询，随后通过语义推理和自动化漏洞利用合成来验证发现。该架构使QRS能够发现超越预定义模式的漏洞类别，同时显著降低误报率。我们在完整的Python软件包而非孤立代码片段上评估QRS。针对流行PyPI库中的20个历史CVE漏洞，QRS实现了90.6%的检测准确率。在应用至下载量前100的PyPI软件包时，QRS发现了39个中高危漏洞，其中5个被分配了新CVE编号，5个获得了文档更新，其余29个被并行研究的学者独立发现，这验证了这些发现的重要性和可探测性。QRS以较低的时间开销和可控的令牌成本实现上述成果，证明LLM驱动的查询合成与代码审查能够补充人工构建的规则集，并揭示现有工业工具未能检测的漏洞模式。

相关内容

QRS

关注 0

2015年，由IEEE可靠性协会主办的SERE会议（IEEE国际软件安全与可靠性会议）和QSIC会议（IEEE国际质量软件会议）合并为一个会议Q R S，Q代表质量，R代表可靠性，S代表安全性。本次会议为来自工业界和学术界的工程师和科学家提供了一个平台，展示他们正在进行的工作，介绍他们的研究成果和经验，并讨论开发可靠、安全和可信系统的最佳和最有效的技术。它也为学术界提供了一个极好的机会，使他们能够在实践者将他们的需求摆在桌面上时，更加了解对软件行业至关重要的主题领域。第20届QRS会议将于2020年7月27日至31日在立陶宛维尔纽斯举行。官网链接：https://qrs20.techconf.org/

《将形式化方法工具应用于电子战代码库（经验报告）》

专知会员服务

11+阅读 · 4月26日

【AAAI2026】NeSTR：一种用于大型语言模型的神经-符号可溯因框架，用于时间推理

专知会员服务

17+阅读 · 2025年12月10日

142页DeepSeek-R1 思维链技术：让我们一起<思考>大语言模型（LLM）的推理能力

专知会员服务

48+阅读 · 2025年4月12日

《探索大型语言模型在军事联盟网络红队中的应用潜力》最新论文

专知会员服务

31+阅读 · 2025年1月5日