Large Language Models (LLMs) have shown remarkable performance in various basic natural language tasks, which raises hopes for achieving Artificial General Intelligence. To better complete complex tasks, we need LLMs to program for the task and then follow the program to generate a specific solution for the test sample. We propose using natural language as a new programming language to describe task procedures, making them easily understandable to both humans and LLMs. LLMs are capable of directly generating natural language programs, but these programs may still contain factual errors or incomplete steps. Therefore, we further propose the Learning to Program (LP) method to ask LLMs themselves to learn natural language programs from the training dataset of complex tasks and then use the learned program to guide inference. Our experiments on the AMPS (high school math) and Math (competition mathematics problems) datasets demonstrate the effectiveness of our approach. When testing ChatGPT on 10 tasks from the AMPS dataset, our LP method's average performance outperformed the direct zero-shot test performance by 18.3$\%$. We release our code at \url{https://github.com/microsoft/NaturalLanguageProgram}.
翻译:大型语言模型(LLMs)在各种基础自然语言任务中展现了卓越性能,这引发了人们对其实现通用人工智能的期待。为更好地完成复杂任务,我们需要LLMs为任务编写程序,然后遵循该程序为测试样本生成具体解决方案。我们提出将自然语言作为一种新型编程语言来描述任务流程,使其对人与LLMs均易于理解。LLMs能够直接生成自然语言程序,但这些程序仍可能包含事实性错误或步骤不完整。因此,我们进一步提出"学习编程"(Learning to Program, LP)方法,让LLMs自行从复杂任务的训练数据集中学习自然语言程序,并利用所学程序指导推理。我们在AMPS(高中数学)和Math(数学竞赛题)数据集上的实验证明了本方法的有效性。在AMPS数据集的10个任务上测试ChatGPT时,我们的LP方法平均性能较直接零样本测试提升了18.3%。代码已开源在\url{https://github.com/microsoft/NaturalLanguageProgram}。