Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast

Static verification is a powerful method for enhancing software quality, but it demands significant human labor and resources. This is particularly true of static verifiers that reason about heap manipulating programs using an ownership logic. LLMs have shown promise in a number of software engineering activities, including code generation, test generation, proof generation for theorem provers, and specification generation for static verifiers. However, prior work has not explored how well LLMs can perform specification generation for specifications based in an ownership logic, such as separation logic. To address this gap, this paper explores the effectiveness of large language models (LLMs), specifically OpenAI's GPT models, in generating fully correct specifications based on separation logic for static verification of human-written programs in VeriFast. Our first experiment employed traditional prompt engineering and the second used Chain-of-Thought (CoT) Prompting to identify and address common errors generated across the GPT models. The results indicate that GPT models can successfully generate specifications for verifying heap manipulating code with VeriFast. Furthermore, while CoT prompting significantly reduces syntax errors generated by the GPT models, it does not greatly improve verification error rates compared to prompt engineering.

翻译：静态验证是提升软件质量的有力方法，但其需要大量人力与资源投入。这对于基于所有权逻辑（如分离逻辑）进行堆操作程序推理的静态验证器尤为明显。大型语言模型已在多项软件工程活动中展现出潜力，包括代码生成、测试生成、定理证明器的证明生成以及静态验证器的规约生成。然而，现有研究尚未深入探讨大型语言模型在基于所有权逻辑（如分离逻辑）的规约生成任务上的表现。为填补这一空白，本文研究了大型语言模型（特别是OpenAI的GPT系列模型）在VeriFast中为人工编写程序生成基于分离逻辑的完全正确规约以进行静态验证的有效性。我们的首个实验采用传统提示工程方法，第二个实验则使用思维链提示策略来识别并修正GPT模型生成的常见错误。结果表明，GPT模型能够成功生成用于VeriFast验证堆操作代码的规约。此外，虽然思维链提示能显著减少GPT模型产生的语法错误，但与提示工程相比，其在降低验证错误率方面并未带来显著改善。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/