Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as "malloc-only" engines, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To break this cycle, we propose Free()LM, a model that introduces an intrinsic self-forgetting capability via the Free-Module, a plug-and-play LoRA adapter. By iteratively switching between reasoning and cleaning modes, Free()LM dynamically identifies and prunes useless context chunks, maintaining a compact and noise-free state. Extensive experiments show that Free()LM provides consistent improvements across all model scales (8B to 685B). It achieves a 3.3% average improvement over top-tier reasoning baselines, even establishing a new SOTA on IMOanswerBench using DeepSeek V3.2-Speciale. Most notably, in long-horizon tasks where the standard Qwen3-235B-A22B model suffers a total collapse (0% accuracy), Free()LM restores performance to 50%. Our findings suggest that sustainable intelligence requires the freedom to forget as much as the power to think.
翻译:推理模型通过扩展测试时计算来增强问题解决能力,然而它们面临一个关键悖论:过多的思考标记往往导致性能下降而非提升。我们将此归因于一个根本性的架构缺陷:标准大语言模型作为“仅分配内存”的引擎运行,持续积累有效和冗余的步骤,却缺乏修剪过时信息的机制。为打破这一循环,我们提出Free()LM模型,该模型通过即插即用的LoRA适配器Free-Module引入了内在的自遗忘能力。通过在推理模式与清理模式间迭代切换,Free()LM动态识别并修剪无用的上下文块,维持紧凑且无噪声的状态。大量实验表明,Free()LM在所有模型规模(8B至685B)上均带来一致的性能提升。相较于顶级推理基线模型,其平均提升达3.3%,更在使用DeepSeek V3.2-Speciale时于IMOanswerBench上建立了新的SOTA。尤为突出的是,在标准Qwen3-235B-A22B模型完全失效(0%准确率)的长视野任务中,Free()LM将性能恢复至50%。我们的研究结果表明,可持续的智能既需要思考的能力,也同样需要遗忘的自由。