Intelligent agents must be able to articulate its own uncertainty. In this work, we show that pre-trained sequence models are naturally capable of probabilistic reasoning over exchangeable data points -- forming informed beliefs and sharpening them as it gathers more information. A sequence model learns the relationship between observations, which differs from typical Bayesian models that quantify uncertainty over latent parameters through priors and likelihoods (e.g., topic models). Despite the apparent difference, we illustrate how exchangeable sequence modeling provides a valid Bayesian model by going back to De Finetti's classical predictive view of probabilistic reasoning: uncertainty comes from data that has not been observed yet, rather than latent parameters. From this perspective, pre-training autoregressive models is equivalent to formulating informed beliefs based on prior observations ("empirical Bayes"), and forward generation is equivalent to simulating instantiations of an environment ("posterior inference"). In particular, exchangeable sequence models can explicitly perform statistical inference; epistemic uncertainty over latent environments is captured by variation in predicted future observations. Formally, we show the sequence prediction loss controls the quality of uncertainty quantification, and propose several approaches for encoding exchangeability in sequence model architectures: data augmentation, regularization, and causal masking.
翻译:智能体必须具备表达自身不确定性的能力。本研究表明,预训练序列模型天然具备对可交换数据点进行概率推理的能力——能够形成有依据的信念,并随着信息收集不断精炼这些信念。序列模型学习观测值之间的关系,这与通过先验和似然函数量化潜在参数不确定性的典型贝叶斯模型(如主题模型)存在差异。尽管存在明显区别,我们通过回归德菲内蒂的概率推理经典预测视角阐明:可交换序列建模如何提供有效的贝叶斯模型——不确定性源于尚未观测到的数据,而非潜在参数。由此视角观之,自回归模型的预训练等价于基于先验观测形成有依据的信念("经验贝叶斯"),前向生成则等价于模拟环境实例化("后验推断")。特别地,可交换序列模型能够显式执行统计推断;对潜在环境的认知不确定性通过预测未来观测值的变化来捕捉。形式化证明显示,序列预测损失控制着不确定性量化的质量,并提出在序列模型架构中实现可交换性的三种方法:数据增强、正则化与因果掩码。