Static verification is a powerful method for enhancing software quality, but it demands significant human labor and resources. This is particularly true of static verifiers that reason about heap manipulating programs using an ownership logic. LLMs have shown promise in a number of software engineering activities, including code generation, test generation, proof generation for theorem provers, and specification generation for static verifiers. However, prior work has not explored how well LLMs can perform specification generation for specifications based in an ownership logic, such as separation logic. To address this gap, this paper explores the effectiveness of large language models (LLMs), specifically OpenAI's GPT models, in generating fully correct specifications based on separation logic for static verification of human-written programs in VeriFast. Our first experiment employed traditional prompt engineering and the second used Chain-of-Thought (CoT) Prompting to identify and address common errors generated across the GPT models. The results indicate that GPT models can successfully generate specifications for verifying heap manipulating code with VeriFast. Furthermore, while CoT prompting significantly reduces syntax errors generated by the GPT models, it does not greatly improve verification error rates compared to prompt engineering.
翻译:静态验证是提升软件质量的有力方法,但其需要大量人力与资源投入。这对于基于所有权逻辑(如分离逻辑)进行堆操作程序推理的静态验证器尤为明显。大型语言模型已在多项软件工程活动中展现出潜力,包括代码生成、测试生成、定理证明器的证明生成以及静态验证器的规约生成。然而,现有研究尚未深入探讨大型语言模型在基于所有权逻辑(如分离逻辑)的规约生成任务上的表现。为填补这一空白,本文研究了大型语言模型(特别是OpenAI的GPT系列模型)在VeriFast中为人工编写程序生成基于分离逻辑的完全正确规约以进行静态验证的有效性。我们的首个实验采用传统提示工程方法,第二个实验则使用思维链提示策略来识别并修正GPT模型生成的常见错误。结果表明,GPT模型能够成功生成用于VeriFast验证堆操作代码的规约。此外,虽然思维链提示能显著减少GPT模型产生的语法错误,但与提示工程相比,其在降低验证错误率方面并未带来显著改善。