Recent advances such as self-consistency and test-time reinforcement learning (TTRL) improve the reliability of large language models (LLMs) without additional supervision, yet their underlying mechanisms and statistical guarantees remain poorly understood. We present a unified framework for certifiable inference in LLMs, showing that majority voting provides a statistical certificate of self-consistency: under mild assumptions, the aggregated answer coincides with the mode of the model's terminal distribution with high probability. We derive finite-sample and anytime-valid concentration bounds that quantify this confidence, and introduce the Martingale Majority Certificate (MMC), a sequential stopping rule that adaptively determines when sufficient samples have been drawn. We further prove that label-free post-training methods such as TTRL implicitly sharpen the answer distribution by exponentially tilting it toward its mode, thereby reducing the number of samples required for certification. Building on this insight, we propose new post-training objectives that explicitly optimise this trade-off between sharpness and bias. Together, these results explain and connect two central test-time scaling strategies, self-consistency and TTRL, within a single statistical framework for label-free, certifiable reliability in reasoning LLMs.
翻译:近期诸如自洽性和测试时强化学习(TTRL)等进展,在无需额外监督的情况下提升了大语言模型(LLMs)的可靠性,然而其内在机制与统计保证仍未得到充分理解。我们提出了一个用于LLMs可认证推理的统一框架,证明多数投票为自洽性提供了统计认证:在温和假设下,聚合答案以高概率与模型终端分布的众数一致。我们推导了量化此置信度的有限样本与任意时间有效的集中界,并引入了鞅多数认证(MMC)——一种自适应确定何时已抽取足够样本的序贯停止规则。我们进一步证明,诸如TTRL这类无标签后训练方法,通过将答案分布向其众数进行指数倾斜而隐式地锐化该分布,从而减少了认证所需的样本量。基于此洞见,我们提出了新的后训练目标,显式优化锐度与偏差之间的权衡。这些结果共同在一个统一的统计框架内,解释并关联了自洽性与TTRL这两种核心的测试时扩展策略,旨在实现推理LLMs的无标签、可认证可靠性。