Beyond Basic Specifications? A Systematic Study of Logical Constructs in LLM-based Specification Generation

Formal specifications play a pivotal role in accurately characterizing program behaviors and ensuring software correctness. In recent years, leveraging large language models (LLMs) for the automatic generation of program specifications has emerged as a promising avenue for enhancing verification efficiency. However, existing research has been predominantly confined to generating specifications based on basic syntactic constructs, falling short of meeting the demands for high-level abstraction in complex program verification. Consequently, we propose incorporating logical constructs into existing LLM-based specification generation framework. Nevertheless, there remains a lack of systematic investigation into whether LLMs can effectively generate such complex constructs. To this end, we conduct an empirical study aimed at exploring the impact of various types of syntactic constructs on specification generation framework. Specifically, we define four syntactic configurations with varying levels of abstraction and perform extensive evaluations on mainstream program verification datasets, employing a diverse set of representative LLMs. Experimental results first confirm that LLMs are capable of generating valid logical constructs. Further analysis reveals that the synergistic use of logical constructs and basic syntactic constructs leads to improvements in both verification capability and robustness, without significantly increasing verification overhead. Additionally, we uncover the distinct advantages of two refinement paradigms. To the best of our knowledge, this is the first systematic work exploring the feasibility of utilizing LLMs for generating high-level logical constructs, providing an empirical basis and guidance for the future construction of automated program verification framework with enhanced abstraction capabilities.

翻译：形式化规约在精确刻画程序行为与确保软件正确性方面发挥着关键作用。近年来，利用大语言模型自动生成程序规约已成为提升验证效率的一条重要途径。然而，现有研究主要局限于基于基础语法结构生成规约，难以满足复杂程序验证对高层抽象的需求。为此，我们提出将逻辑构造融入现有基于LLM的规约生成框架。然而，目前仍缺乏关于LLM能否有效生成此类复杂构造的系统性探究。为此，我们开展了一项实证研究，旨在探索各类语法构造对规约生成框架的影响。具体而言，我们定义了四种具有不同抽象层次的语法配置，并在主流程序验证数据集上使用多种代表性LLM进行了广泛评估。实验结果首先证实了LLM能够生成有效的逻辑构造。进一步分析表明，逻辑构造与基础语法构造的协同使用可在不显著增加验证开销的前提下，提升验证能力与鲁棒性。此外，我们揭示了两种精化范式的独特优势。据我们所知，这是首个系统性探索利用LLM生成高层逻辑构造可行性的工作，为未来构建具备增强抽象能力的自动化程序验证框架提供了实证依据与指导。