Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

Large language models (LLMs) have emerged as powerful tools for many AI problems and exhibit remarkable in-context learning (ICL) capabilities. Compositional ability, solving unseen complex tasks that combine two or more simple tasks, is an essential reasoning ability for Artificial General Intelligence. Despite LLM's tremendous success, how they approach composite tasks, especially those not encountered during the pretraining phase, remains an open question and largely ununderstood. In this study, we delve into the ICL capabilities of LLMs on composite tasks, with only simple tasks as in-context examples. We develop a test suite of composite tasks that include linguistic and logical challenges and perform empirical studies across different LLM families. We observe that models exhibit divergent behaviors: (1) For simpler composite tasks that apply distinct mapping mechanisms to different input segments, the models demonstrate decent compositional ability, while scaling up the model enhances this ability; (2) for more complex composite tasks that involving reasoning multiple steps, where each step represent one task, models typically underperform, and scaling up generally provide no improvements. We offer theoretical analysis in a simplified setting, explaining that models exhibit compositional capability when the task handles different input parts separately. We believe our work sheds new light on the capabilities of LLMs in solving composite tasks regarding the nature of the tasks and model scale. Our dataset and code are available at {\url{https://github.com/OliverXUZY/LLM_Compose}}.

翻译：大型语言模型已成为解决诸多人工智能问题的强大工具，并展现出卓越的上下文学习能力。组合能力——即通过结合两个或多个简单任务来解决未见复杂任务的能力——是实现通用人工智能所需的关键推理能力。尽管大型语言模型取得了巨大成功，它们如何处理组合任务（尤其是预训练阶段未接触过的任务）仍是一个悬而未决且尚未被充分理解的课题。本研究深入探究了大型语言模型在仅以简单任务作为上下文示例的情况下处理组合任务的能力。我们开发了一套包含语言与逻辑挑战的组合任务测试集，并对不同系列的大型语言模型进行了实证研究。研究发现模型表现出两种不同行为模式：（1）对于需要对不同输入片段应用独立映射机制的较简单组合任务，模型展现出良好的组合能力，且模型规模的扩大会增强这种能力；（2）对于涉及多步推理（每一步代表一个子任务）的更复杂组合任务，模型通常表现欠佳，且模型规模的扩大通常不会带来改进。我们在简化设定下进行了理论分析，证明当任务能够分别处理不同输入部分时，模型会展现出组合能力。本研究从任务本质与模型规模的角度，为理解大型语言模型解决组合任务的能力提供了新的见解。我们的数据集与代码公开于 {\url{https://github.com/OliverXUZY/LLM_Compose}}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日