Test-time Scaling (TTS) has been demonstrated to significantly enhance the reasoning capabilities of Large Language Models (LLMs) during the inference phase without altering model parameters. However, existing TTS methods are largely independent, implying that LLMs have not yet evolved to progressively learn how to scale more effectively. With the objective of evolving LLMs to learn ``how to scale test-time computation,'' we propose LatentEvolve, a self-evolving latent TTS framework inspired by the complementary learning system (CLS) theory. Analogous to the human brain's dual system of a fast-recall hippocampus and a slow-consolidating neocortex, LatentEvolve comprises two evolutionary components: \textit{daytime scaling}, which rapidly retrieves historical latent representations to better guide current LLM reasoning; and \textit{nighttime scaling}, which integrates past latent optimizations in a manner akin to the human brain's consolidation of experiences during sleep. The alternation of daytime and nighttime processes facilitates a fast and slow evolution of LLM TTS, mirroring human cognitive dynamics in a fully unsupervised manner. Extensive experiments across eight benchmarks and five model backbones demonstrate that our LatentEvolve surpasses state-of-the-art TTS methods such as LatentSeek and TTRL by up to $13.33\%$ and exhibits exceptional cross-domain and cross-backbone generalization.
翻译:测试时缩放(TTS)已被证明能在不改变模型参数的情况下,显著增强大型语言模型(LLMs)在推理阶段的推理能力。然而,现有的TTS方法大多是相互独立的,这意味着LLMs尚未进化到能够逐步学习如何更有效地进行缩放。以促使LLMs学习“如何缩放测试时计算”为目标,我们提出了LatentEvolve,这是一个受互补学习系统(CLS)理论启发的自演化潜在TTS框架。类似于人类大脑中快速回忆的海马体和缓慢巩固的新皮质的双重系统,LatentEvolve包含两个演化组件:\textit{日间缩放},它快速检索历史潜在表示以更好地指导当前LLM的推理;以及\textit{夜间缩放},它整合过去的潜在优化,其方式类似于人类大脑在睡眠期间对经验的巩固。日间与夜间过程的交替促进了LLM TTS的快速与缓慢演化,以完全无监督的方式模拟了人类的认知动态。在八个基准测试和五个模型骨干上进行的大量实验表明,我们的LatentEvolve超越了最先进的TTS方法(如LatentSeek和TTRL),性能提升高达$13.33\%$,并展现出卓越的跨领域和跨骨干泛化能力。