Neuro-Symbolic Software Verification: Hyper-charging Local Language Models with Symbolic Reasoning at Scale

Loop invariant synthesis remains a central and pivotal bottleneck in formal software verification. Recent LLM-based Neuro-Symbolic tools have achieved impressive solve rates. However, these tools rely on proprietary, often expensive cloud APIs, which constitute a hurdle for privacy-sensitive industrial deployments where the source code cannot leave the organisation or where cost is a factor. We present VerIbmc, a neuro-symbolic pipeline that pairs symbolic invariant generation with locally deployable open-weight language models with the ESBMC verification tool. Our pipeline combines a deterministic symbolic invariant synthesis phase with an iterative LLM refinement loop driven by structured verifier feedback. In addition, we provide two types of pipelines that differ in their prompting strategy: Chain-of-Thought vs. Tree-of-Thought. We conduct an extensive experimental evaluation with five open-weight models (ranging from 7B to 120B parameters) across five benchmark families comprising of 520 problems (499 after excluding 21 with unavoidable overflow). Overall, the best single configuration (GPT-OSS-120B) solves 431 of 499 problems (86.4%). Additionally, on the four benchmark suites shared with the strongest cloud-API tools, VerIbmc is competitive running only on a single local machine. The evaluation shows symbolic invariant synthesis solves 75 problems without any LLM call and yields up to +35 additional problems for the weakest model. Importantly, all inference runs entirely on a single local machine using open-weight models -- no cloud API or proprietary model is required. Overall, we demonstrate that a neuro-symbolic approach based on LLMs can be used effectively for invariant synthesis in a privacy-preserving and energy-efficient manner, without having to resort to expensive proprietary frontier models locked behind APIs.

翻译：循环不变量合成仍是形式化软件验证中的核心瓶颈。基于LLM的神经符号工具近期取得了惊人的解决率，但这些工具依赖专有且昂贵的云端API，对代码不能离开组织或成本敏感的工业部署构成隐私障碍。我们提出VerIbmc——一种神经符号流水线，将符号不变量生成与可本地部署的开放权重语言模型及ESBMC验证工具相结合。该流水线融合了确定性符号不变量合成阶段与受结构化验证器反馈驱动的迭代LLM精化循环。此外，我们提供两种基于不同提示策略的流水线：思维链与思维树。我们使用五个开放权重模型（参数规模从7B到120B）在五个基准测试族（共520个问题，排除21个不可避免的溢出问题后为499个）上进行了广泛实验评估。整体上，最佳单配置（GPT-OSS-120B）解决了499个问题中的431个（86.4%）。在与最强云端API工具共享的四个基准测试集上，VerIbmc仅在单台本地机器上运行即可保持竞争力。评估显示，符号不变量合成无需调用LLM即可解决75个问题，并为最弱模型额外解决最多35个问题。重要的是，所有推理过程完全在单台本地机器上使用开放权重模型完成——无需任何云端API或专有模型。综上，我们证明基于LLM的神经符号方法可有效用于不变量合成，同时实现隐私保护与高能效，无需诉诸锁定在API后的昂贵专有前沿模型。