Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore this graph autonomously. Upon receiving a question, the agent first undertakes a step-by-step analysis and devises a rational plan. It then invokes a set of predefined functions to read node content and neighbors, facilitating a coarse-to-fine exploration of the graph. Throughout the exploration, the agent continuously records new insights and reflects on current circumstances to optimize the process until it has gathered sufficient information to generate an answer. Experimental results on the LV-Eval dataset reveal that GraphReader, using a 4k context window, consistently outperforms GPT-4-128k across context lengths from 16k to 256k by a large margin. Additionally, our approach demonstrates superior performance on four challenging single-hop and multi-hop benchmarks.
翻译:长上下文处理能力对于大语言模型(LLMs)应对复杂的长输入任务至关重要。尽管已有大量研究致力于优化LLMs的长上下文处理能力,但在稳健处理长输入方面仍存在挑战。本文提出GraphReader,一种基于图的智能体系统,通过将长文本结构化构建为图,并利用智能体自主探索该图来处理长文本。当接收到问题时,智能体首先进行逐步分析并制定合理规划,随后调用一组预定义函数来读取节点内容及其相邻节点,实现对图的从粗到细的探索。在整个探索过程中,智能体持续记录新发现的信息,并对当前状况进行反思以优化探索过程,直至收集到足够信息以生成答案。在LV-Eval数据集上的实验结果表明,仅使用4k上下文窗口的GraphReader,在16k至256k的上下文长度范围内均大幅优于GPT-4-128k。此外,我们的方法在四个具有挑战性的单跳和多跳基准测试中也展现出卓越性能。