Several emerging post-Bayesian methods target a probability distribution for which an entropy-regularised variational objective is minimised. This increased flexibility introduces a computational challenge, as one loses access to an explicit unnormalised density for the target. To mitigate this difficulty, we introduce a novel measure of suboptimality called 'gradient discrepancy', and in particular a 'kernel' gradient discrepancy (KGD) that can be explicitly computed. In the standard Bayesian context, KGD coincides with the kernel Stein discrepancy (KSD), and we obtain a novel characterisation of KSD as measuring the size of a variational gradient. Outside this familiar setting, KGD enables novel sampling algorithms to be developed and compared, even when unnormalised densities cannot be obtained. To illustrate this point several novel algorithms are proposed and studied, including a natural generalisation of Stein variational gradient descent, with applications to mean-field neural networks and predictively oriented posteriors presented. On the theoretical side, our principal contribution is to establish sufficient conditions for desirable properties of KGD, such as continuity and convergence control.
翻译:若干新兴的后贝叶斯方法旨在最小化熵正则化变分目标,以逼近某一概率分布。这种灵活性的提升带来了计算上的挑战,因为目标分布的未归一化密度函数不再显式可得。为缓解这一困难,我们引入了一种称为“梯度差异”的新型次优性度量,特别是其中一种可显式计算的内核梯度差异(KGD)。在标准贝叶斯框架下,KGD与核斯坦差异(KSD)一致,我们由此获得了KSD的一种新颖刻画:它度量了变分梯度的大小。在这一熟悉框架之外,即使无法获得未归一化密度,KGD也能支持新型采样算法的开发与比较。为阐明这一点,本文提出并研究了若干新算法,包括斯坦变分梯度下降的自然推广,并展示了其在均值场神经网络和预测导向后验中的应用。在理论方面,我们的主要贡献是为KGD的连续性、收敛控制等理想性质建立了充分条件。