Exchangeable Sequence Models Can Naturally Quantify Uncertainty Over Latent Concepts

Intelligent agents must be able to articulate its own uncertainty. In this work, we show that pre-trained sequence models are naturally capable of probabilistic reasoning over exchangeable data points -- forming informed beliefs and sharpening them as it gathers more information. A sequence model learns the relationship between observations, which differs from typical Bayesian models that quantify uncertainty over latent parameters through priors and likelihoods (e.g., topic models). Despite the apparent difference, we illustrate how exchangeable sequence modeling provides a valid Bayesian model by going back to De Finetti's classical predictive view of probabilistic reasoning: uncertainty comes from data that has not been observed yet, rather than latent parameters. From this perspective, pre-training autoregressive models is equivalent to formulating informed beliefs based on prior observations ("empirical Bayes"), and forward generation is equivalent to simulating instantiations of an environment ("posterior inference"). In particular, exchangeable sequence models can explicitly perform statistical inference; epistemic uncertainty over latent environments is captured by variation in predicted future observations. Formally, we show the sequence prediction loss controls the quality of uncertainty quantification, and propose several approaches for encoding exchangeability in sequence model architectures: data augmentation, regularization, and causal masking.

翻译：智能体必须具备表达自身不确定性的能力。本研究表明，预训练序列模型天然具备对可交换数据点进行概率推理的能力——能够形成有依据的信念，并随着信息收集不断精炼这些信念。序列模型学习观测值之间的关系，这与通过先验和似然函数量化潜在参数不确定性的典型贝叶斯模型（如主题模型）存在差异。尽管存在明显区别，我们通过回归德菲内蒂的概率推理经典预测视角阐明：可交换序列建模如何提供有效的贝叶斯模型——不确定性源于尚未观测到的数据，而非潜在参数。由此视角观之，自回归模型的预训练等价于基于先验观测形成有依据的信念（"经验贝叶斯"），前向生成则等价于模拟环境实例化（"后验推断"）。特别地，可交换序列模型能够显式执行统计推断；对潜在环境的认知不确定性通过预测未来观测值的变化来捕捉。形式化证明显示，序列预测损失控制着不确定性量化的质量，并提出在序列模型架构中实现可交换性的三种方法：数据增强、正则化与因果掩码。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Graph Transformer近期进展

专知会员服务

65+阅读 · 2023年1月5日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日