Latent Chain-of-Thought (Latent CoT) models promise efficient reasoning via continuous representations, yet exhibit puzzling performance inconsistencies: excelling at exploration (ProsQA: 97.0%) but failing at computation (GSM8K: 34.1%). We reveal that this trade-off is governed by decisional certainty. Our contributions are threefold: (1) We theoretically characterize the fundamental Exploration-Execution Trade-off, proving that high certainty enables precise execution but inhibits exploration, while low certainty facilitates search but causes error accumulation. (2) We introduce the Symbolic Index--quantifying decisional commitment--as the core mechanism governing this trade-off and establish its causal relationship with both execution stability and exploration capability. (3) We prove that curriculum learning is theoretically necessary, as direct training provably fails due to distributional mismatch. Our framework shifts the design paradigm from binary architectural choices toward adaptive systems that dynamically regulate decisional certainty based on task demands.
翻译:潜在思维链(Latent CoT)模型通过连续表示实现高效推理,却表现出令人困惑的性能不一致性:在探索任务(ProsQA:97.0%)中表现卓越,但在计算任务(GSM8K:34.1%)中表现欠佳。我们揭示这种权衡由决策确定性主导。我们的贡献包括三个方面:(1)我们从理论上刻画了探索-执行权衡的基本规律,证明高确定性能够实现精确执行但会抑制探索,而低确定性有利于搜索但会导致误差累积。(2)我们引入符号化指数——量化决策承诺度——作为主导该权衡的核心机制,并确立其与执行稳定性及探索能力的因果关系。(3)我们证明课程学习在理论上是必要的,因为分布失配问题可证明会导致直接训练失败。我们的框架将设计范式从二元架构选择转向能够根据任务需求动态调节决策确定性的自适应系统。