Large Language Models (LLMs) have shown promising results in automating formal verification. However, existing approaches treat proof generation as a static, end-to-end prediction over source code, relying on limited verifier feedback and lacking access to concrete program behaviors. We present EXVERUS, a counterexample-guided framework that enables LLMs to reason about proofs using behavioral feedback via counterexamples. When a proof fails, EXVERUS automatically generates and validates counterexamples, and then guides the LLM to generalize them into inductive invariants to block these failures. Our evaluation shows that EXVERUS significantly improves proof accuracy, robustness, and token efficiency over the state-of-the-art prompting-based Verus proof generator.
翻译:大语言模型在自动化形式化验证领域已展现出前景。然而,现有方法将证明生成视为基于源代码的静态端到端预测,依赖有限的验证器反馈,且缺乏对具体程序行为的访问。我们提出EXVERUS——一种基于反例引导的框架,使大语言模型能够通过反例获得的行为反馈来推理证明。当证明失败时,EXVERUS自动生成并验证反例,随后引导大语言模型将其泛化为归纳不变量以阻断这些失败。评估表明,与最先进的基于提示的Verus证明生成器相比,EXVERUS在证明精度、鲁棒性和token效率方面均有显著提升。