This paper is devoted to the stochastic approximation of entropically regularized Wasserstein distances between two probability measures, also known as Sinkhorn divergences. The semi-dual formulation of such regularized optimal transportation problems can be rewritten as a non-strongly concave optimisation problem. It allows to implement a Robbins-Monro stochastic algorithm to estimate the Sinkhorn divergence using a sequence of data sampled from one of the two distributions. Our main contribution is to establish the almost sure convergence and the asymptotic normality of a new recursive estimator of the Sinkhorn divergence between two probability measures in the discrete and semi-discrete settings. We also study the rate of convergence of the expected excess risk of this estimator in the absence of strong concavity of the objective function. Numerical experiments on synthetic and real datasets are also provided to illustrate the usefulness of our approach for data analysis.
翻译:本文致力于研究两个概率测度间熵正则化Wasserstein距离(亦称Sinkhorn散度)的随机逼近问题。此类正则化最优传输问题的半对偶形式可重构为非强凹优化问题,从而允许通过Robbins-Monro随机算法,利用从其中一个分布采样的数据序列来估计Sinkhorn散度。我们的主要贡献在于:在离散与半离散情形下,针对两个概率测度间Sinkhorn散度的新型递归估计量,建立了几乎必然收敛性与渐近正态性。同时,在目标函数缺乏强凹性的条件下,研究了该估计量期望超额风险的收敛速率。本文还通过合成数据与真实数据集的数值实验,验证了所提方法在数据分析中的有效性。