In software development, formal program specifications play a crucial role in various stages. However, manually crafting formal program specifications is rather difficult, making the job time-consuming and labor-intensive. Moreover, it is even more challenging to write specifications that correctly and comprehensively describe the semantics of complex programs. To reduce the burden on software developers, automated specification generation methods have emerged. However, existing methods usually rely on predefined templates or grammar, making them struggle to accurately describe the behavior and functionality of complex real-world programs. To tackle this challenge, we introduce SpecGen, a novel technique for formal program specification generation based on Large Language Models. Our key insight is to overcome the limitations of existing methods by leveraging the code comprehension capability of LLMs. The process of SpecGen consists of two phases. The first phase employs a conversational approach that guides the LLM to generate appropriate specifications for a given program. The second phase, designed for where the LLM fails to generate correct specifications, applies four mutation operators to the model-generated specifications and selects verifiable specifications from the mutated ones through a novel heuristic selection strategy by assigning different weights of variants in an efficient manner. To evaluate the performance of SpecGen, we manually construct a dataset containing 120 test cases. Our experimental results demonstrate that SpecGen succeeds in generating verifiable specifications for 100 out of 120 programs, outperforming the existing purely LLM-based approaches and conventional specification generation tools. Further investigations on the quality of generated specifications indicate that SpecGen can comprehensively articulate the behaviors of the input program.
翻译:摘要: 在软件开发中,正式程序规范在多个阶段发挥着关键作用。然而,手动编写正式程序规范相当困难,导致这一工作既耗时又费力。更甚者,为复杂程序编写能准确且全面描述其语义的规范更具挑战性。为减轻软件开发者的负担,自动规范生成方法应运而生。然而,现有方法通常依赖预定义模板或语法,难以准确描述复杂真实程序的行为和功能。为应对这一挑战,我们提出SpecGen——一种基于大语言模型的正式程序规范生成新技术。其核心思路在于利用大语言模型的代码理解能力突破现有方法的局限。SpecGen的流程分为两个阶段:第一阶段采用对话式方法,引导大语言模型为给定程序生成合适的规范;第二阶段针对大语言模型未能生成正确规范的情况,对模型生成的规范应用四种变异算子,并通过一种基于变异体权重高效分配的新型启发式选择策略,从变异后的规范中筛选出可验证的规范。为评估SpecGen的性能,我们人工构建了包含120个测试用例的数据集。实验结果表明,SpecGen成功为100个程序生成了可验证规范,优于现有纯大语言模型方法和传统规范生成工具。对生成规范质量的进一步探究表明,SpecGen能够全面描述输入程序的行为。