Formal program specifications play a crucial role in various stages of software development. However, manually crafting formal program specifications is rather difficult, making the job time-consuming and labor-intensive. It is even more challenging to write specifications that correctly and comprehensively describe the semantics of complex programs. To reduce the burden on software developers, automated specification generation methods have emerged. However, existing methods usually rely on predefined templates or grammar, making them struggle to accurately describe the behavior and functionality of complex real-world programs. To tackle this challenge, we introduce SpecGen, a novel technique for formal program specification generation based on Large Language Models. Our key insight is to overcome the limitations of existing methods by leveraging the code comprehension capability of LLMs. The process of SpecGen consists of two phases. The first phase employs a conversational approach that guides the LLM to generate appropriate specifications for a given program. The second phase, designed for where the LLM fails to generate correct specifications, applies four mutation operators to the model-generated specifications and selects verifiable specifications from the mutated ones through a novel heuristic selection strategy. We evaluate SpecGen on two datasets, including the SV-COMP Java category benchmark and a manually constructed dataset. Experimental results demonstrate that SpecGen succeeds in generating verifiable specifications for 279 out of 385 programs, outperforming the existing purely LLM-based approaches and conventional specification generation tools like Houdini and Daikon. Further investigations on the quality of generated specifications indicate that SpecGen can comprehensively articulate the behaviors of the input program.
翻译:形式化程序规约在软件开发的各个阶段都起着至关重要的作用。然而,手动编写形式化程序规约相当困难,导致这项工作耗时耗力。若要编写出正确且全面描述复杂程序语义的规约,则更具挑战性。为减轻软件开发人员的负担,自动化规约生成方法应运而生。然而,现有方法通常依赖于预定义的模板或语法,难以准确描述复杂实际程序的行为和功能。为应对这一挑战,我们提出了SpecGen,一种基于大语言模型的形式化程序规约生成新技术。我们的核心思路是通过利用大语言模型的代码理解能力来克服现有方法的局限性。SpecGen的生成过程包含两个阶段:第一阶段采用对话式方法,引导大语言模型为给定程序生成合适的规约;第二阶段针对大语言模型未能生成正确规约的情况,对模型生成的规约应用四种变异算子,并通过一种新颖的启发式选择策略从变异后的规约中筛选出可验证的规约。我们在两个数据集上评估了SpecGen,包括SV-COMP Java类别基准测试集和一个人工构建的数据集。实验结果表明,SpecGen成功为385个程序中的279个生成了可验证的规约,其性能优于现有的纯大语言模型方法以及Houdini、Daikon等传统规约生成工具。对生成规约质量的进一步分析表明,SpecGen能够全面清晰地阐述输入程序的行为。