Formal verification can provably guarantee the correctness of critical system software, but the high proof burden has long hindered its wide adoption. Recently, Large Language Models (LLMs) have shown success in code analysis and synthesis. In this paper, we present a combination of LLMs and static analysis to synthesize invariants, assertions, and other proof structures for a Rust-based formal verification framework called Verus. In a few-shot setting, LLMs demonstrate impressive logical ability in generating postconditions and loop invariants, especially when analyzing short code snippets. However, LLMs lack the ability to retain and propagate context information, a strength of traditional static analysis. Based on these observations, we developed a prototype based on OpenAI's GPT-4 model. Our prototype decomposes the verification task into multiple smaller ones, iteratively queries GPT-4, and combines its output with lightweight static analysis. We evaluated the prototype with a developer in the automation loop on 20 vector-manipulating programs. The results demonstrate that it significantly reduces human effort in writing entry-level proof code.
翻译:形式化验证可以可证明地保证关键系统软件的正确性,但长期以来高昂的证明负担阻碍了其广泛采用。近年来,大型语言模型在代码分析和合成方面展现出了成功应用。本文提出了一种结合大型语言模型和静态分析的方法,为名为Verus的基于Rust的形式化验证框架合成不变量、断言及其他证明结构。在少样本场景下,大型语言模型在生成后置条件和循环不变量方面展现出令人印象深刻的逻辑能力,尤其在分析短代码片段时表现突出。然而,大型语言模型缺乏保留和传播上下文信息的能力,而这正是传统静态分析的优势所在。基于这些观察,我们开发了一个基于OpenAI GPT-4模型的原型系统。该原型将验证任务分解为多个子任务,迭代查询GPT-4,并将其输出与轻量级静态分析相结合。我们通过一位开发者在自动化循环中对该原型进行了评估,测试对象为20个向量操作程序。结果表明,该方法显著减少了编写入门级证明代码所需的人工工作量。