Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore this graph autonomously. Upon receiving a question, the agent first undertakes a step-by-step analysis and devises a rational plan. It then invokes a set of predefined functions to read node content and neighbors, facilitating a coarse-to-fine exploration of the graph. Throughout the exploration, the agent continuously records new insights and reflects on current circumstances to optimize the process until it has gathered sufficient information to generate an answer. Experimental results on the LV-Eval dataset reveal that GraphReader, using a 4k context window, consistently outperforms GPT-4-128k across context lengths from 16k to 256k by a large margin. Additionally, our approach demonstrates superior performance on four challenging single-hop and multi-hop benchmarks.
翻译:长上下文能力对于大语言模型处理复杂的长输入任务至关重要。尽管已有大量工作致力于优化大语言模型的长上下文处理能力,但在稳健处理长输入方面仍存在挑战。本文提出GraphReader,一种基于图的智能体系统,通过将长文本结构化为图并利用智能体自主探索该图来处理长文本。接收到问题后,智能体首先进行逐步分析并制定合理计划,随后调用一组预定义函数来读取节点内容及其邻居,从而实现对图由粗到细的探索。在整个探索过程中,智能体持续记录新发现并对当前状况进行反思以优化流程,直至收集到足够信息生成答案。在LV-Eval数据集上的实验结果表明,使用4k上下文窗口的GraphReader,在16k至256k的上下文长度范围内,均以显著优势持续超越GPT-4-128k。此外,我们的方法在四个具有挑战性的单跳和多跳基准测试上也展现出卓越性能。