Motivated by the computation of the non-parametric maximum likelihood estimator (NPMLE) and the Bayesian posterior in statistics, this paper explores the problem of convex optimization over the space of all probability distributions. We introduce an implicit scheme, called the implicit KL proximal descent (IKLPD) algorithm, for discretizing a continuous-time gradient flow relative to the Kullback-Leibler divergence for minimizing a convex target functional. We show that IKLPD converges to a global optimum at a polynomial rate from any initialization; moreover, if the objective functional is strongly convex relative to the KL divergence, for example, when the target functional itself is a KL divergence as in the context of Bayesian posterior computation, IKLPD exhibits globally exponential convergence. Computationally, we propose a numerical method based on normalizing flow to realize IKLPD. Conversely, our numerical method can also be viewed as a new approach that sequentially trains a normalizing flow for minimizing a convex functional with a strong theoretical guarantee.
翻译:受统计学中非参数最大似然估计和贝叶斯后验计算的启发,本文研究所有概率分布空间上的凸优化问题。我们提出一种隐式方案——隐式KL近端下降算法,用于离散化相对于Kullback-Leibler散度的连续时间梯度流,以最小化凸目标泛函。我们证明:该算法从任意初始值出发均能以多项式速率收敛至全局最优解;当目标泛函相对于KL散度满足强凸性时(例如在贝叶斯后验计算中目标泛函本身为KL散度),IKLPD算法具有全局指数收敛性。在计算层面,我们提出基于归一化流的数值方法实现该算法。反之,我们的数值方法亦可视为一种序贯训练归一化流的新框架,能够在强理论保证下最小化凸泛函。