Assured Automatic Programming via Large Language Models

With the advent of AI-based coding engines, it is possible to convert natural language requirements to executable code in standard programming languages. However, AI-generated code can be unreliable, and the natural language requirements driving this code may be ambiguous. In other words, the intent may not be accurately captured in the code generated from AI-coding engines like Copilot. The goal of our work is to discover the programmer intent, while generating code which conforms to the intent and a proof of this conformance. Our approach to intent discovery is powered by a novel repair engine called program-proof co-evolution, where the object of repair is a tuple (code, logical specification, test) generated by an LLM from the same natural language description. The program and the specification capture the initial operational and declarative description of intent, while the test represents a concrete, albeit partial, understanding of the intent. Our objective is to achieve consistency between the program, the specification, and the test by incrementally refining our understanding of the user intent. Reaching consistency through this repair process provides us with a formal, logical description of the intent, which is then translated back into natural language for the developer's inspection. The resultant intent description is now unambiguous, though expressed in natural language. We demonstrate how the unambiguous intent discovered through our approach increases the percentage of verifiable auto-generated programs on a recently proposed dataset in the Dafny programming language.

翻译：随着基于人工智能的代码生成引擎的出现，将自然语言需求转换为标准编程语言的可执行代码已成为可能。然而，AI生成的代码可能不可靠，且驱动代码生成的自然语言需求可能存在歧义。换言之，在Copilot等AI编码引擎生成的代码中，程序员的意图可能无法被准确捕捉。本工作的目标是在生成符合意图的代码及其符合性证明的同时，发现程序员的真实意图。我们的意图发现方法由一种称为"程序-证明协同演化"的新型修复引擎驱动，其修复对象是由大型语言模型根据同一自然语言描述生成的三元组（代码、逻辑规约、测试）。程序与规约共同捕捉了意图的初始操作性和声明性描述，而测试则代表了对意图的具体（尽管是部分的）理解。我们的目标是通过逐步细化对用户意图的理解，实现程序、规约与测试之间的一致性。通过此修复过程达成一致性后，我们将获得意图的形式化逻辑描述，随后将其转换回自然语言供开发者审阅。最终得到的意图描述虽仍以自然语言表达，但已消除歧义。我们通过最近提出的Dafny编程语言数据集证明，采用本方法发现的明确意图可显著提升可验证自动生成程序的百分比。