In classification tasks, softmax functions are ubiquitously used as output activations to produce predictive probabilities. Such outputs only capture aleatoric uncertainty. To capture epistemic uncertainty, approximate Gaussian inference methods have been proposed. We develop a common formalism to describe such methods, which we view as outputting Gaussian distributions over the logit space. Predictives are then obtained as the expectations of the Gaussian distributions pushed forward through the softmax. However, such softmax Gaussian integrals cannot be solved analytically, and Monte Carlo (MC) approximations can be costly and noisy. We propose to replace the softmax activation by element-wise normCDF or sigmoid, which allows for the accurate sampling-free approximation of predictives. This also enables the approximation of the Gaussian pushforwards by Dirichlet distributions with moment matching. This approach entirely eliminates the runtime and memory overhead associated with MC sampling. We evaluate it combined with several approximate Gaussian inference methods (Laplace, HET, SNGP) on large- and small-scale datasets (ImageNet, CIFAR-100, CIFAR-10), demonstrating improved uncertainty quantification capabilities compared to softmax MC sampling. Our code is available at https://github.com/bmucsanyi/probit.
翻译:在分类任务中,softmax函数被普遍用作输出激活函数以生成预测概率。此类输出仅能捕捉偶然不确定性。为捕捉认知不确定性,研究者提出了多种近似高斯推断方法。我们建立了一个统一的数学形式来描述这类方法,将其视为在logit空间上输出高斯分布。预测分布则通过将高斯分布经softmax前推后取期望获得。然而,此类softmax高斯积分无法解析求解,而蒙特卡洛(MC)近似方法往往计算成本高昂且噪声显著。我们提出使用逐元素的normCDF或sigmoid函数替代softmax激活,这使得无需采样的精确预测分布近似成为可能。该方法还可通过矩匹配将高斯前推分布近似为狄利克雷分布。这一方案完全消除了MC采样带来的运行时与内存开销。我们将其与多种近似高斯推断方法(Laplace、HET、SNGP)结合,在大规模与小规模数据集(ImageNet、CIFAR-100、CIFAR-10)上进行评估,结果表明相较于softmax MC采样,该方法在不确定性量化能力方面具有显著提升。代码已发布于https://github.com/bmucsanyi/probit。