Intelligent agents must be able to articulate its own uncertainty. In this work, we show that pre-trained sequence models are naturally capable of probabilistic reasoning over exchangeable data points -- forming informed beliefs and sharpening them as it gathers more information. A sequence model learns the relationship between observations, which differs from typical Bayesian models that quantify uncertainty over latent parameters through priors and likelihoods (e.g., topic models). Despite the apparent difference, we illustrate how exchangeable sequence modeling provides a valid Bayesian model by going back to De Finetti's classical predictive view of probabilistic reasoning: uncertainty comes from data that has not been observed yet, rather than latent parameters. From this perspective, pre-training autoregressive models is equivalent to formulating informed beliefs based on prior observations ("empirical Bayes"), and forward generation is equivalent to simulating instantiations of an environment ("posterior inference"). In particular, exchangeable sequence models can explicitly perform statistical inference; epistemic uncertainty over latent environments is captured by variation in predicted future observations. Formally, we show the sequence prediction loss controls the quality of uncertainty quantification, and propose several approaches for encoding exchangeability in sequence model architectures: data augmentation, regularization, and causal masking.
翻译:智能体必须具备表达自身不确定性的能力。本文研究表明,预训练序列模型天然具备对可交换数据点进行概率推理的能力——能够形成有根据的信念,并在收集更多信息时不断修正。序列模型学习观测值之间的关系,这不同于通过先验和似然函数量化潜在参数不确定性的典型贝叶斯模型(如主题模型)。尽管存在明显差异,我们通过回归德菲内蒂的概率推理经典预测观点阐明:可交换序列建模如何提供有效的贝叶斯模型——不确定性源于尚未观测到的数据,而非潜在参数。从这个视角看,自回归模型预训练等同于基于先验观测形成有根据的信念("经验贝叶斯"),前向生成等同于模拟环境实例化("后验推断")。特别地,可交换序列模型能显式执行统计推断;对潜在环境的认知不确定性通过预测未来观测值的变化来表征。形式上,我们证明序列预测损失控制着不确定性量化的质量,并提出在序列模型架构中实现可交换性的多种方法:数据增强、正则化和因果掩码。