The Importance-Weighted Evidence Lower Bound (IW-ELBO) has emerged as an effective objective for variational inference (VI), tightening the standard ELBO and mitigating the mode-seeking behaviour. However, optimizing the IW-ELBO in Euclidean space is often inefficient, as its gradient estimators suffer from a vanishing signal-to-noise ratio (SNR). This paper formulates the optimisation of the IW-ELBO in Bures-Wasserstein space, a manifold of Gaussian distributions equipped with the 2-Wasserstein metric. We derive the Wasserstein gradient of the IW-ELBO and project it onto the Bures-Wasserstein space to yield a tractable algorithm for Gaussian VI. A pivotal contribution of our analysis concerns the stability of the gradient estimator. While the SNR of the standard Euclidean gradient estimator is known to vanish as the number of importance samples $K$ increases, we prove that the SNR of the Wasserstein gradient scales favourably as $Ω(\sqrt{K})$, ensuring optimisation efficiency even for large $K$. We further extend this geometric analysis to the Variational Rényi Importance-Weighted Autoencoder bound, establishing analogous stability guarantees. Experiments demonstrate that the proposed framework achieves superior approximation performance compared to other baselines.
翻译:重要性加权证据下界(IW-ELBO)已成为变分推断(VI)的一种有效目标函数,它收紧标准ELBO并缓解了模式寻求行为。然而,在欧几里得空间中优化IW-ELBO通常效率低下,因为其梯度估计器的信噪比(SNR)趋于消失。本文在Bures-Wasserstein空间中构建了IW-ELBO的优化问题,该空间是配备2-Wasserstein度量的高斯分布流形。我们推导了IW-ELBO的Wasserstein梯度,并将其投影到Bures-Wasserstein空间,从而为高斯VI产生了一种可处理的算法。我们分析的一个关键贡献涉及梯度估计器的稳定性。虽然已知标准欧几里得梯度估计器的SNR会随着重要性样本数$K$的增加而消失,但我们证明了Wasserstein梯度的SNR以$Ω(\sqrt{K})$的有利方式缩放,确保即使对于较大的$K$也能实现优化效率。我们进一步将这种几何分析扩展到变分Rényi重要性加权自编码器界,建立了类似的稳定性保证。实验表明,与其他基线方法相比,所提出的框架实现了更优的近似性能。