A common practice in large language model (LLM) usage for complex analytical tasks such as code generation, is to sample a solution for the entire task within the model's context window. Previous works have shown that subtask decomposition within the model's context (chain of thought), is beneficial for solving such tasks. In this work, we point a limitation of LLMs' ability to perform several sub-tasks within the same context window - an in-context hardness of composition, pointing to an advantage for distributing a decomposed problem in a multi-agent system of LLMs. The hardness of composition is quantified by a generation complexity metric, i.e., the number of LLM generations required to sample at least one correct solution. We find a gap between the generation complexity of solving a compositional problem within the same context relative to distributing it among multiple agents, that increases exponentially with the solution's length. We prove our results theoretically and demonstrate them empirically.
翻译:在大型语言模型(LLM)用于代码生成等复杂分析任务时,一种常见做法是在模型的上下文窗口内为整个任务采样一个解决方案。先前的研究表明,在模型上下文内进行子任务分解(思维链)有助于解决此类任务。本研究指出了LLM在同一上下文窗口中执行多个子任务的能力存在一种局限——一种上下文内的组合难度,这凸显了将分解后的问题分配到多LLM智能体系统中的优势。组合难度通过一种生成复杂度度量进行量化,即采样至少一个正确解决方案所需的LLM生成次数。我们发现,在同一上下文中解决组合问题的生成复杂度与将其分配到多个智能体之间的复杂度存在差距,且该差距随解决方案长度的增加呈指数级增长。我们从理论上证明了这一结果,并通过实验进行了验证。