Large language models (LLMs) have demonstrated impressive performance in understanding language and executing complex reasoning tasks. However, LLMs with long context windows have been notorious for their expensive training costs and high inference latency. Even the most advanced models such as GPT-4 and Claude2 often make mistakes when processing inputs of over $100k$ tokens, a phenomenon also known as \textit{lost in the middle}. In this paper, we propose \textsc{LongAgent}, a method based on multi-agent collaboration, which scales LLMs (e.g., LLaMA) to a context of 128K and demonstrates potential superiority in long-text processing compared to GPT-4. In \textsc{LongAgent}, a leader is responsible for understanding user intent and directing team members to acquire information from documents. Due to members' hallucinations, it is non-trivial for a leader to obtain accurate information from the responses of dozens to hundreds of members. To address this, we develop an \textit{inter-member communication} mechanism to resolve response conflicts caused by hallucinations through information sharing. Our experimental results indicate that \textsc{LongAgent} offers a promising alternative for long-text processing. The agent team instantiated with LLaMA-7B achieves significant improvements in tasks such as 128k-long text retrieval, multi-hop question answering, compared to GPT-4.
翻译:大语言模型(LLMs)在理解语言和执行复杂推理任务方面展现了令人瞩目的性能。然而,具有长上下文窗口的大语言模型因其昂贵的训练成本和高推理延迟而备受诟病。即便是GPT-4和Claude2等最先进的模型,在处理超过10万token的输入时也经常出错,这一现象也被称为“中间丢失”。在本文中,我们提出LongAgent,一种基于多智能体协作的方法,该方法将LLMs(如LLaMA)扩展到128K的上下文,并在长文本处理中展现出相比GPT-4的潜在优势。在LongAgent中,一个领导智能体负责理解用户意图并指导团队成员从文档中获取信息。由于成员可能产生幻觉,领导智能体从数十至数百个成员的响应中获取准确信息并非易事。为解决这一问题,我们开发了一种“成员间通信”机制,通过信息共享来解决由幻觉引起的响应冲突。实验结果表明,LongAgent为长文本处理提供了一种有前景的替代方案。以LLaMA-7B实例化的智能体团队在128K长文本检索、多跳问答等任务上取得了相比GPT-4显著的改进。