Chain-of-Thought (CoT) has become a cornerstone of reasoning in large language models, yet its effectiveness is constrained by the limited expressiveness of discrete token sampling. Recent latent reasoning approaches attempt to alleviate this limitation by replacing discrete tokens with soft embeddings (probability-weighted mixtures of token embeddings) or hidden states, but they commonly suffer from two issues: (1) global activation injects perturbations into high-confidence steps, impairing reasoning stability; and (2) soft embeddings quickly collapse toward the highest-probability token, limiting exploration of alternative trajectories. To address these challenges, we propose SeLaR (Selective Latent Reasoning), a lightweight and training-free framework. SeLaR introduces an entropy-gated mechanism that activates soft embeddings only at low-confidence steps, while preserving discrete decoding at high-confidence steps. Additionally, we propose an entropy-aware contrastive regularization that pushes soft embeddings away from the dominant (highest-probability) token's direction, encouraging sustained exploration of multiple latent reasoning paths. Experiments on five reasoning benchmarks demonstrate that SeLaR consistently outperforms standard CoT and state-of-the-art training-free methods.
翻译:思维链已成为大语言模型推理的基石,但其有效性受限于离散词元采样的有限表达能力。近期潜在推理方法试图通过用软嵌入或隐藏状态替代离散词元来缓解这一局限,但普遍存在两个问题:(1)全局激活会向高置信度步骤注入扰动,削弱推理稳定性;(2)软嵌入会迅速坍缩至最高概率词元,限制替代路径的探索。为应对这些挑战,我们提出SeLaR——一种轻量级且无需训练的框架。SeLaR引入熵门控机制,仅在低置信度步骤激活软嵌入,而在高置信度步骤保留离散解码。此外,我们提出熵感知对比正则化,将软嵌入推向远离主导词元的方向,鼓励对多条潜在推理路径进行持续探索。在五个推理基准上的实验表明,SeLaR始终优于标准思维链及最先进的免训练方法。