Transformers Can Do Bayesian Inference

Currently, it is hard to reap the benefits of deep learning for Bayesian methods, which allow the explicit specification of prior knowledge and accurately capture model uncertainty. We present Prior-Data Fitted Networks (PFNs). PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. Presented with a set of samples from a new supervised learning task as input, PFNs make probabilistic predictions for arbitrary other data points in a single forward propagation, having learned to approximate Bayesian inference. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods. We obtain strong results in very diverse areas such as Gaussian process regression, Bayesian neural networks, classification for small tabular data sets, and few-shot image classification, demonstrating the generality of PFNs. Code and trained PFNs are released at https://github.com/automl/TransformersCanDoBayesianInference.

翻译：目前，将深度学习的优势应用于贝叶斯方法仍面临挑战，而贝叶斯方法允许显式指定先验知识并准确捕捉模型不确定性。我们提出了先验数据拟合网络（PFNs）。PFNs利用大规模机器学习技术中的上下文学习能力来近似一大类后验分布。PFNs有效运作的唯一要求是能够从监督学习任务（或函数）的先验分布中进行采样。我们的方法将后验近似目标重新表述为一个具有集合值输入的监督分类问题：它重复地从先验中抽取一个任务（或函数），从中抽取一组数据点及其标签，掩蔽其中一个标签，并基于其余数据点的集合值输入学习对其做出概率预测。当输入来自一个新监督学习任务的一组样本时，PFNs通过单次前向传播即可对任意其他数据点做出概率预测，从而学习近似贝叶斯推断。我们证明PFNs能够近乎完美地模拟高斯过程，并且能为难以处理的问题实现高效的贝叶斯推断，在多种设置下相比现有方法获得超过200倍的加速。我们在高斯过程回归、贝叶斯神经网络、小规模表格数据分类以及少样本图像分类等多个差异显著的领域取得了强劲结果，证明了PFNs的通用性。代码及训练好的PFNs已发布于 https://github.com/automl/TransformersCanDoBayesianInference。