Large Language Models (LLMs) have shown remarkable performance in various basic natural language tasks, which raises hopes for achieving Artificial General Intelligence. To better complete complex tasks, we need LLMs to program for the task and then follow the program to generate a specific solution for the test sample. We propose using natural language as a new programming language to describe task procedures, making them easily understandable to both humans and LLMs. LLMs are capable of directly generating natural language programs, but these programs may still contain factual errors or incomplete steps. Therefore, we further propose the Learning to Program (LP) method to ask LLMs themselves to learn natural language programs from the training dataset of complex tasks and then use the learned program to guide inference. Our experiments on the AMPS (high school math) and Math (competition mathematics problems) datasets demonstrate the effectiveness of our approach. When testing ChatGPT on 10 tasks from the AMPS dataset, our LP method's average performance outperformed the direct zero-shot test performance by 18.3$\%$. We release our code at \url{https://github.com/microsoft/NaturalLanguageProgram}.
翻译:大型语言模型在各类基础自然语言任务中展现出卓越性能,这为实现通用人工智能带来了希望。为了更好地完成复杂任务,我们需要让大型语言模型为任务编写程序,然后根据程序为测试样本生成具体解决方案。本文提出将自然语言作为一种新型编程语言来描述任务流程,使其对人和大型语言模型都易于理解。大型语言模型能够直接生成自然语言程序,但这些程序可能仍包含事实错误或不完整的步骤。为此,我们进一步提出"学习编程"方法,让大型语言模型从复杂任务的训练数据集中自主学习自然语言程序,并利用学习到的程序指导推理。我们在AMPS(高中数学)和Math(竞赛数学题)数据集上的实验验证了该方法的有效性。在AMPS数据集的10个任务上对ChatGPT进行测试时,我们的LP方法的平均性能比直接零样本测试性能高出18.3%。我们已在\url{https://github.com/microsoft/NaturalLanguageProgram}上发布代码。