Though LLMs are capable of generating plausible programs, it's challenging to interact with the LLMs further to revise the program, especially if the user's specific requirements are different from the initial proposal. In this paper, we introduce ANPL, an interactive programming system that ensures users can always refine the generated code towards their specific programmatic intents via structured decompositions. Borrowing the paradigm of sketching from program synthesis, an ANPL program consists of a set of input-outputs that it must satisfy, a ``sketch'' -- control/data flow expressed in precise code (e.g. Python), and ``holes'' -- sub-modules to be implemented by the LLM specified with natural language. The user revises an ANPL program by either modifying the sketch, changing the language used to describe the holes, or providing additional input-outputs to a particular hole, turning it into a sub-ANPL program that can be solved recursively. This workflow allows the users to offload programming burdens to the LLM as much as possible while retaining the ability to pinpoint and resolve bugs locally, without exposing the rest of the program to the LLM. We deploy ANPL on the Abstraction and Reasoning Corpus (ARC), a set of unique tasks that are challenging for state-of-the-art AI systems, showing it outperforms baseline programming systems that (a) without the ability to decompose tasks interactively and (b) without the guarantee that the modules can be correctly composed together. Additional evaluations on APPS, HumanEval, and real-world programming tasks have validated that the ANPL framework is applicable to multiple programming domains. We release the ANPL solutions to the ARC tasks as a dataset, providing insights into how humans decompose novel tasks programmatically. See our code at https://iprc-dip.github.io/ANPL/.
翻译:尽管大语言模型能够生成合理的程序,但当用户的具体需求与初始方案存在差异时,进一步与模型交互修改程序仍面临挑战。本文提出ANPL交互式编程系统,通过结构化分解确保用户能始终将生成代码精炼至符合特定编程意图。借鉴程序合成中的草图范式,ANPL程序包含三要素:必须满足的输入-输出约束集、以精确代码(如Python)实现的控制/数据流“草图”,以及由自然语言指定待大语言模型实现的“孔洞”子模块。用户可通过三种方式修改ANPL程序:调整草图、更改孔洞描述语言、或为特定孔洞补充输入-输出示例(将其转化为可递归求解的子ANPL程序)。该工作流使用户在保留局部精确调试能力的同时,能最大限度将编程负担转嫁给大语言模型,且无需将程序其余部分暴露给模型。我们在抽象推理语料库(ARC)——一组令当前最强AI系统都颇具挑战的独特任务——上部署ANPL,证明其优于两类基线编程系统:(a)无法交互式分解任务的系统;(b)无法保证模块正确组合的系统。在APPS、HumanEval和实际编程任务上的额外评估验证了ANPL框架适用于多个编程领域。我们以数据集形式发布ARC任务的ANPL解决方案,揭示了人类如何以编程方式分解新任务。代码详见https://iprc-dip.github.io/ANPL/。