Large Language Models (LLMs) have demonstrated impressive capabilities in generating code, yet they often produce programs with flaws or deviations from intended behavior, limiting their suitability for safety-critical applications. To address this limitation, this paper introduces VeCoGen, a novel tool that combines LLMs with formal verification to automate the generation of formally verified C programs. VeCoGen takes a formal specification in ANSI/ISO C Specification Language (ACSL), a natural language specification, and a set of test cases to attempt to generate a program. This program-generation process consists of two steps. First, VeCoGen generates an initial set of candidate programs. Secondly, the tool iteratively improves on previously generated candidates. If a candidate program meets the formal specification, then we are sure the program is correct. We evaluate VeCoGen on 15 problems presented in Codeforces competitions. On these problems, VeCoGen solves 13 problems. This work shows the potential of combining LLMs with formal verification to automate program generation.
翻译:大型语言模型(LLMs)在代码生成方面展现出卓越能力,但其生成的程序常存在缺陷或偏离预期行为,限制了其在安全关键场景中的应用。为突破这一局限,本文提出VeCoGen——一种将LLMs与形式化验证相结合的新型工具,用于自动生成经过形式化验证的C程序。VeCoGen接收ANSI/ISO C规范语言(ACSL)编写的形式化规约、自然语言规约及一组测试用例,尝试生成目标程序。该程序生成过程包含两个阶段:首先生成初始候选程序集合,随后工具对既有候选程序进行迭代优化。若候选程序满足形式化规约,即可确保其正确性。我们在Codeforces竞赛的15道题目上评估VeCoGen,该工具成功解决了其中13道问题。本研究表明,将LLMs与形式化验证相结合在自动化程序生成领域具有显著潜力。