Parallel test-time scaling (TTS) is a pivotal approach for enhancing large language models (LLMs), typically by sampling multiple token-based chains-of-thought in parallel and aggregating outcomes through voting or search. Recent advances in latent reasoning, where intermediate reasoning unfolds in continuous vector spaces, offer a more efficient alternative to explicit Chain-of-Thought, yet whether such latent models can similarly benefit from parallel TTS remains open, mainly due to the absence of sampling mechanisms in continuous space, and the lack of probabilistic signals for advanced trajectory aggregation. This work enables parallel TTS for latent reasoning models by addressing the above issues. For sampling, we introduce two uncertainty-inspired stochastic strategies: Monte Carlo Dropout and Additive Gaussian Noise. For aggregation, we design a Latent Reward Model (LatentRM) trained with step-wise contrastive objective to score and guide latent reasoning. Extensive experiments and visualization analyses show that both sampling strategies scale effectively with compute and exhibit distinct exploration dynamics, while LatentRM enables effective trajectory selection. Together, our explorations open a new direction for scalable inference in continuous spaces. Code and checkpoints released at https://github.com/ModalityDance/LatentTTS
翻译:并行测试时扩展(TTS)是增强大语言模型(LLMs)的关键方法,通常通过并行采样多个基于标记的思维链,并通过投票或搜索聚合结果来实现。近期潜推理领域的进展——其中间推理过程在连续向量空间中展开——为显式思维链提供了一种更高效的替代方案,然而此类潜模型是否同样能从并行TTS中受益仍属未知,这主要源于连续空间中采样机制的缺失,以及缺乏用于高级轨迹聚合的概率信号。本研究通过解决上述问题,实现了潜推理模型的并行TTS。针对采样,我们引入了两种受不确定性启发的随机策略:蒙特卡洛丢弃与加性高斯噪声。针对聚合,我们设计了一种通过逐步对比目标训练的潜奖励模型(LatentRM),用于评分和指导潜推理。大量实验与可视化分析表明,两种采样策略均能随计算量有效扩展并展现出不同的探索动态,而LatentRM能够实现有效的轨迹选择。综合而言,我们的探索为连续空间中的可扩展推理开辟了新方向。代码与检查点发布于 https://github.com/ModalityDance/LatentTTS