Variational inference is a technique for approximating intractable posterior distributions in order to quantify the uncertainty of machine learning. Although the unimodal Gaussian distribution is usually chosen as a parametric distribution, it hardly approximates the multimodality. In this paper, we employ the Gaussian mixture distribution as a parametric distribution. A main difficulty of variational inference with the Gaussian mixture is how to approximate the entropy of the Gaussian mixture. We approximate the entropy of the Gaussian mixture as the sum of the entropy of the unimodal Gaussian, which can be analytically calculated. In addition, we theoretically analyze the approximation error between the true entropy and approximated one in order to reveal when our approximation works well. Specifically, the approximation error is controlled by the ratios of the distances between the means to the sum of the variances of the Gaussian mixture. Furthermore, it converges to zero when the ratios go to infinity. This situation seems to be more likely to occur in higher dimensional parametric spaces because of the curse of dimensionality. Therefore, our result guarantees that our approximation works well, for example, in neural networks that assume a large number of weights.
翻译:变分推断是一种用于近似难以处理的贝叶斯后验分布的技术,旨在量化机器学习中的不确定性。尽管单峰高斯分布通常被选为参数化分布,但它难以近似多峰分布。本文采用混合高斯分布作为参数化分布。混合高斯变分推断的主要困难在于如何近似混合高斯的熵。我们将混合高斯的熵近似为可解析计算的单峰高斯熵之和。此外,我们从理论上分析了真实熵与近似熵之间的近似误差,以揭示我们的近似在何种条件下有效。具体而言,该近似误差受混合高斯各分量均值间距与方差之和的比值控制。进一步地,当该比值趋近于无穷大时,近似误差收敛至零。由于维度灾难现象,这种情形在高维参数空间中更可能出现。因此,我们的结果保证了该方法在假设大量权重的神经网络等场景中具有良好的近似效果。