We consider extensions of the Shannon relative entropy, referred to as $f$-divergences.Three classical related computational problems are typically associated with these divergences: (a) estimation from moments, (b) computing normalizing integrals, and (c) variational inference in probabilistic models. These problems are related to one another through convex duality, and for all them, there are many applications throughout data science, and we aim for computationally tractable approximation algorithms that preserve properties of the original problem such as potential convexity or monotonicity. In order to achieve this, we derive a sequence of convex relaxations for computing these divergences from non-centered covariance matrices associated with a given feature vector: starting from the typically non-tractable optimal lower-bound, we consider an additional relaxation based on ``sums-of-squares'', which is is now computable in polynomial time as a semidefinite program. We also provide computationally more efficient relaxations based on spectral information divergences from quantum information theory. For all of the tasks above, beyond proposing new relaxations, we derive tractable convex optimization algorithms, and we present illustrations on multivariate trigonometric polynomials and functions on the Boolean hypercube.
翻译:我们考虑香农相对熵的推广形式,即$f$-散度。通常与这些散度相关的三个经典计算问题是:(a)基于矩的估计,(b)归一化积分计算,以及(c)概率模型中的变分推断。这些问题通过凸对偶性相互关联,并在数据科学领域具有广泛的应用。我们的目标是设计计算可行的近似算法,同时保留原问题的关键性质(如潜在凸性或单调性)。为此,我们推导出一系列凸松弛方法,用于从给定特征向量的非中心协方差矩阵计算这些散度:从通常难以计算的最优下界出发,进一步引入基于"平方和"(sums-of-squares)的松弛,该松弛可通过半定规划在多项式时间内求解。同时,我们基于量子信息论中的谱信息散度,提出了计算效率更高的松弛方案。针对上述所有任务,除提出新松弛方法外,我们还推导了可解的凸优化算法,并在多元三角多项式及布尔超立方体函数上给出了算法演示。