Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs. Traditional verification tool support for program refinement is highly interactive and lacks automation. On the other hand, the emergence of large language models (LLMs) enables automatic code generations from informal natural language specifications. However, code generated by LLMs is often unreliable. Moreover, the opaque procedure from specification to code provided by LLM is an uncontrolled black box. We propose LLM4PR, a tool that combines formal program refinement techniques with informal LLM-based methods to (1) transform the specification to preconditions and postconditions, (2) automatically build prompts based on refinement calculus, (3) interact with LLM to generate code, and finally, (4) verify that the generated code satisfies the conditions of refinement calculus, thus guaranteeing the correctness of the code. We have implemented our tool using GPT4, Coq, and Coqhammer, and evaluated it on the HumanEval and EvalPlus datasets.
翻译:程序精化涉及从形式化的高层规约语句到可执行程序的、保持正确性的转换。传统的程序精化验证工具支持高度交互且缺乏自动化。另一方面,大语言模型(LLMs)的出现使得能够从非形式化的自然语言规约自动生成代码。然而,LLM生成的代码通常不可靠。此外,LLM提供的从规约到代码的不透明过程是一个不受控的黑盒。我们提出了LLM4PR,这是一个将形式化程序精化技术与基于LLM的非形式化方法相结合的工具,旨在:(1)将规约转换为前置条件和后置条件;(2)基于精化演算自动构建提示;(3)与LLM交互以生成代码;最后(4)验证生成的代码满足精化演算的条件,从而保证代码的正确性。我们使用GPT4、Coq和Coqhammer实现了该工具,并在HumanEval和EvalPlus数据集上对其进行了评估。