Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In this paper, we propose LightThinker, a method that enables LLMs to dynamically compress intermediate thoughts into compact semantic representations. However, static compression often struggles with complex reasoning where the irreversible loss of intermediate details can lead to logical bottlenecks. To address this, we evolve the framework into LightThinker++, introducing Explicit Adaptive Memory Management. This paradigm shifts to behavioral-level management by incorporating explicit memory primitives, supported by a specialized trajectory synthesis pipeline to train purposeful memory scheduling. Extensive experiments demonstrate the framework's versatility across three dimensions. (1) LightThinker reduces peak token usage by 70% and inference time by 26% with minimal accuracy loss. (2) In standard reasoning, LightThinker++ slashes peak token usage by 69.9% while yielding a +2.42% accuracy gain under the same context budget for maximum performance. (3) Most notably, in long-horizon agentic tasks, it maintains a stable footprint beyond 80 rounds (a 60%-70% reduction), achieving an average performance gain of 14.8% across different complex scenarios. Overall, our work provides a scalable direction for sustaining deep LLM reasoning over extended horizons with minimal overhead.
翻译:大型语言模型(LLMs)在复杂推理方面表现出色,但其效率受到长思维轨迹带来的认知负荷激增的限制。本文提出LightThinker方法,使LLMs能够将中间思考动态压缩为紧凑的语义表示。然而,静态压缩常因不可逆地丢失中间细节而导致逻辑瓶颈,难以应对复杂推理。为此,我们将框架演进为LightThinker++,引入显式自适应记忆管理。该范式通过引入显式记忆原语转向行为级管理,并借助专门的轨迹合成流水线训练目标导向的记忆调度。大量实验从三个维度证明了框架的通用性:(1)LightThinker在精度损失极小的情况下,将峰值标记使用量减少70%,推理时间缩短26%;(2)在标准推理中,LightThinker++在同等上下文预算下实现峰值标记使用量降低69.9%的同时,获得+2.42%的精度提升以追求最大性能;(3)最值得注意的是,在长时程智能体任务中,其能在超过80轮交互(标记量降低60%-70%)中保持稳定开销,在不同复杂场景下平均性能提升14.8%。总体而言,本工作为在扩展时间跨度内以极低开销维持LLM深度推理提供了可扩展方向。