Complex Reasoning in Large Language Models can be dynamically optimized using Test-Time Scaling (TTS) to mitigate Overthinking. Methods such as Coconut, SoftCoT and its variant are effective in continuous latent space inference, the core bottleneck still lies in the efficient generation and utilization of high-quality Latent Thought. Drawing from the theory of SoftCoT++ that a larger variance in the generated Latent Thought distribution more closely approximates the golden truth distribution, we propose a Latent Thought-Augmented Training Framework--LTA-Thinker, which improves distributional variance and enhances reasoning performance from two perspectives. First, LTA-Thinker constructs a Latent Thought generation architecture based on a learnable prior. This architecture aims to increase the variance distribution of generated Latent Thought Vectors in order to simplify the overall structure and raise the performance ceiling. Second, LTA-Thinker introduces a distribution-based directional optimization paradigm that jointly constrains both distribution locality and distribution scale. This mechanism improves information efficiency and computational cost through a multi-objective co-training strategy, which combines standard Supervised Fine-Tuning (SFT) loss with two novel losses: Semantic Alignment Loss, which utilizes KL divergence to ensure that the Latent Thought is highly relevant to the semantics of the question; Reasoning Focus Loss, which utilizes a contrastive learning mechanism to guide the model to focus on the most critical reasoning steps. Experiments show that LTA-thinker achieves state-of-the-art (SOTA) performance among various baselines and demonstrates a higher performance ceiling and better scaling effects.
翻译:大型语言模型中的复杂推理可通过测试时缩放动态优化,以缓解过度思考现象。尽管Coconut、SoftCoT及其变体等方法在连续潜在空间推理中表现有效,但核心瓶颈仍在于高质量潜在思维的高效生成与利用。基于SoftCoT++理论——潜在思维分布方差越大越接近真实分布,我们提出潜在思维增强训练框架LTA-Thinker,从两个维度提升分布方差并增强推理性能:首先,LTA-Thinker构建基于可学习先验的潜在思维生成架构,旨在增大生成潜在思维向量的方差分布,从而简化整体结构并提升性能上限;其次,引入基于分布的方向性优化范式,联合约束分布局部性与分布尺度,通过多目标协同训练策略提升信息效率并降低计算成本。该策略结合标准监督微调损失与两项新型损失函数:语义对齐损失(利用KL散度确保潜在思维与问题语义高度相关)和推理聚焦损失(采用对比学习机制引导模型关注关键推理步骤)。实验表明,LTA-Thinker在多种基线中达到最先进性能,并展现出更高的性能上限与更优的缩放效应。