Large language models (LLMs) are often hypothesized to perform implicit Bayesian inference, yet a key coherence condition, the martingale property of predictive beliefs, has been shown to fail in controlled synthetic in-context learning settings. We revisit this question in a more typical usage regime: generic multiple-choice question answering. Exploiting the discrete answer space, we compute exact predictive distributions and study belief dynamics induced by autoregressive answer resampling. We introduce prompted predictive resampling (PPR), where an LLM generates a sequence of answers to the same question. Empirically, PPR reveals early-stage belief drift, indicating martingale violations. However, after sufficient resampling steps, the belief process self-stabilizes and converges to a coherent predictive distribution. Based on this observation, we further propose (i) a seed-answer prompting strategy to accelerate stabilization, and (ii) a self-consistency loss that amortizes early-stage drift into the model via fine-tuning. Experiments on multiple-choice QA benchmarks show that our methods substantially reduce belief drift and improve predictive coherence without sacrificing accuracy.
翻译:大型语言模型通常被假设能执行隐式贝叶斯推断,然而一项关键的一致性条件——预测信念的鞅性质——已被证实在受控的合成上下文学习设置中失效。我们在更典型的使用场景(通用多项选择题问答)中重新审视该问题。利用离散答案空间,我们计算精确预测分布并研究自回归答案重采样引发的信念动态。我们提出提示引导预测重采样方法,即让大语言模型对同一问题生成一系列答案。实验表明,该方法揭示了早期信念漂移现象,表明鞅性质被违反。但在充分重采样后,信念过程会自我稳定并收敛到一致的预测分布。基于此观察,我们进一步提出:(一)种子答案提示策略以加速收敛稳定性,以及(二)通过微调将早期漂移分摊到模型中的自一致性损失。在多项选择题问答基准上的实验表明,我们的方法在不牺牲准确率的前提下显著减少了信念漂移并提升了预测一致性。