While large language models (LLMs), such as GPT-3, appear to be robust and general, their reasoning ability is not at a level to compete with the best models trained for specific natural language reasoning problems. In this study, we observe that a large language model can serve as a highly effective few-shot semantic parser. It can convert natural language sentences into a logical form that serves as input for answer set programs, a logic-based declarative knowledge representation formalism. The combination results in a robust and general system that can handle multiple question-answering tasks without requiring retraining for each new task. It only needs a few examples to guide the LLM's adaptation to a specific task, along with reusable ASP knowledge modules that can be applied to multiple tasks. We demonstrate that this method achieves state-of-the-art performance on several NLP benchmarks, including bAbI, StepGame, CLUTRR, and gSCAN. Additionally, it successfully tackles robot planning tasks that an LLM alone fails to solve.
翻译:尽管大语言模型(如GPT-3)看似稳健且通用,但其推理能力尚无法与专为特定自然语言推理问题训练的最佳模型相匹敌。本研究发现,大语言模型可作为高效的小样本语义解析器,能够将自然语言句子转换为逻辑形式,作为基于逻辑的声明性知识表示形式——回答集程序的输入。这种结合形成了一个稳健且通用的系统,可处理多种问答任务而无需针对每项新任务重新训练。系统仅需少量示例即可引导大语言模型适应特定任务,并配合可跨任务复用的ASP知识模块。我们证明,该方法在bAbI、StepGame、CLUTRR和gSCAN等多个NLP基准测试中达到了最佳性能。此外,它还能成功解决单独使用大语言模型无法完成的机器人规划任务。