In the world of Harry Potter, when Dumbledore's mind is overburdened, he extracts memories into a Pensieve to be revisited later. In the world of AI, while we possess the Pensieve-mature databases and retrieval systems, our models inexplicably lack the "wand" to operate it. They remain like a Dumbledore without agency, passively accepting a manually engineered context as their entire memory. This work finally places the wand in the model's hand. We introduce StateLM, a new class of foundation models endowed with an internal reasoning loop to manage their own state. We equip our model with a suite of memory tools, such as context pruning, document indexing, and note-taking, and train it to actively manage these tools. By learning to dynamically engineering its own context, our model breaks free from the architectural prison of a fixed window. Experiments across various model sizes demonstrate StateLM's effectiveness across diverse scenarios. On long-document QA tasks, StateLMs consistently outperform standard LLMs across all model scales; on the chat memory task, they achieve absolute accuracy improvements of 10% to 20% over standard LLMs. On the deep research task BrowseComp-Plus, the performance gap becomes even more pronounced: StateLM achieves up to 52% accuracy, whereas standard LLM counterparts struggle around 5%. Ultimately, our approach shifts LLMs from passive predictors to state-aware agents where reasoning becomes a stateful and manageable process.
翻译:在《哈利·波特》的世界中,当邓布利多的思维不堪重负时,他会将记忆抽取至冥想盆中供后续回溯。在人工智能领域,虽然我们已拥有堪比冥想盆的成熟数据库与检索系统,但模型却始终缺乏操作它的“魔杖”。它们如同失去自主能力的邓布利多,被动接受人工设计的上下文作为全部记忆。本研究终于将魔杖交予模型之手。我们提出了StateLM——一类具备内部推理循环以自主管理状态的新型基础模型。我们为模型配备了一套记忆工具(包括上下文剪枝、文档索引与笔记记录),并训练其主动管理这些工具。通过学习动态构建自身上下文,模型突破了固定窗口架构的桎梏。不同规模模型的实验验证了StateLM在多类场景中的有效性:在长文档问答任务中,StateLM在所有模型尺度上均持续超越标准大语言模型;在对话记忆任务中,其绝对准确率较标准大语言模型提升10%至20%;在深度研究任务BrowseComp-Plus中,性能差距更为显著——StateLM达到52%的准确率,而标准大语言模型仅徘徊在5%左右。最终,我们的方法将大语言模型从被动预测器转变为具备状态感知的智能体,使推理成为可管理、具状态性的过程。