Recent verification tools aim to make formal verification more accessible to software engineers by automating most of the verification process. However, annotating conventional programs with the formal specification and verification constructs (preconditions, postconditions, loop invariants, auxiliary predicates and functions and proof helpers) required to prove their correctness still demands significant manual effort and expertise. This paper investigates how LLMs can automatically generate such annotations for programs written in Dafny, a verification-aware programming language, starting from conventional code accompanied by natural language specifications (in comments) and test code. In experiments on 110 Dafny programs, a multimodel approach combining Claude Opus 4.5 and GPT-5.2 generated correct annotations for 98.2% of the programs within at most 8 repair iterations, using verifier feedback. A logistic regression analysis shows that proof-helper annotations contribute disproportionately to problem difficulty for current LLMs. Assertions in the test cases served as static oracles to automatically validate the generated pre/postconditions. We also compare generated and manual solutions and present an extension for Visual Studio Code to incorporate automatic generation into the IDE, with encouraging usability feedback.
翻译:近期验证工具致力于通过自动化大部分验证流程,使形式化验证更易于软件工程师使用。然而,为传统程序标注证明其正确性所需的形式化规约与验证构造(前置条件、后置条件、循环不变量、辅助谓词与函数以及证明辅助项)仍需大量人工投入与专业知识。本文研究如何利用LLM为Dafny(一种支持验证的编程语言)编写的程序自动生成此类标注,其输入为附带自然语言规约(以注释形式)与测试代码的传统代码。在110个Dafny程序的实验中,结合Claude Opus 4.5与GPT-5.2的多模型方法通过最多8次修复迭代,利用验证器反馈为98.2%的程序生成了正确标注。逻辑回归分析表明,证明辅助项标注对当前LLM构成不成比例的求解难度。测试用例中的断言可作为静态预言,自动验证生成的前置/后置条件。我们进一步对比了自动生成与人工编写的解决方案,并开发了Visual Studio Code扩展以在集成开发环境中集成自动生成功能,获得了积极的可用性反馈。