The advents of Large Language Models (LLMs) have shown promise in augmenting programming using natural interactions. However, while LLMs are proficient in compiling common usage patterns into a programming language, e.g., Python, it remains a challenge how to edit and debug an LLM-generated program. We introduce ANPL, a programming system that allows users to decompose user-specific tasks. In an ANPL program, a user can directly manipulate sketch, which specifies the data flow of the generated program. The user annotates the modules, or hole with natural language descriptions offloading the expensive task of generating functionalities to the LLM. Given an ANPL program, the ANPL compiler generates a cohesive Python program that implements the functionalities in hole, while respecting the dataflows specified in sketch. We deploy ANPL on the Abstraction and Reasoning Corpus (ARC), a set of unique tasks that are challenging for state-of-the-art AI systems, showing it outperforms baseline programming systems that (a) without the ability to decompose tasks interactively and (b) without the guarantee that the modules can be correctly composed together. We obtain a dataset consisting of 300/400 ARC tasks that were successfully decomposed and grounded in Python, providing valuable insights into how humans decompose programmatic tasks. See the dataset at https://iprc-dip.github.io/DARC.
翻译:大型语言模型(LLMs)的进展展现了通过自然交互增强编程的前景。然而,尽管LLMs擅长将常见使用模式编译为编程语言(例如Python),如何编辑和调试LLM生成的程序仍是一个挑战。我们提出ANPL这一编程系统,允许用户分解特定任务。在ANPL程序中,用户可直接操控草图(sketch),该草图指定了生成程序的数据流。用户通过自然语言描述对模块或孔洞(hole)进行标注,将生成功能这一繁重任务交由LLM处理。给定ANPL程序后,ANPL编译器会生成一个完整的Python程序,该程序实现孔洞中的功能,同时遵循草图中指定的数据流。我们将ANPL应用于抽象与推理语料库(ARC)——一组对先进AI系统具有挑战性的独特任务,结果表明其优于(a)无法交互式分解任务及(b)无法保证模块正确组合的基线编程系统。我们获得了300/400个成功分解并基于Python实现的ARC任务数据集,为人类如何分解程序性任务提供了宝贵见解。数据集见https://iprc-dip.github.io/DARC。