This paper is focused on the study of entropic regularization in optimal transport as a smoothing method for Wasserstein estimators, through the prism of the classical tradeoff between approximation and estimation errors in statistics. Wasserstein estimators are defined as solutions of variational problems whose objective function involves the use of an optimal transport cost between probability measures. Such estimators can be regularized by replacing the optimal transport cost by its regularized version using an entropy penalty on the transport plan. The use of such a regularization has a potentially significant smoothing effect on the resulting estimators. In this work, we investigate its potential benefits on the approximation and estimation properties of regularized Wasserstein estimators. Our main contribution is to discuss how entropic regularization may reach, at a lower computational cost, statistical performances that are comparable to those of un-regularized Wasserstein estimators in statistical learning problems involving distributional data analysis. To this end, we present new theoretical results on the convergence of regularized Wasserstein estimators. We also study their numerical performances using simulated and real data in the supervised learning problem of proportions estimation in mixture models using optimal transport.
翻译:本文聚焦于最优传输中的熵正则化作为Wasserstein估计器平滑方法的研究,从统计学中近似误差与估计误差的经典权衡视角展开分析。Wasserstein估计器定义为变分问题的解,其目标函数涉及概率测度间最优传输代价的使用。通过引入传输计划上的熵惩罚项,此类估计器可被正则化——即用正则化版本替代原始最优传输代价。这种正则化对最终估计器具有潜在的显著平滑效应。本文系统探讨了熵正则化对正则化Wasserstein估计器近似与估计性质的潜在益处。主要贡献在于论证:在涉及分布数据分析的统计学习问题中,熵正则化如何能以更低的计算成本达到与未正则化Wasserstein估计器相当的统计性能。为此,我们提出了关于正则化Wasserstein估计器收敛性的新理论结果,并通过混合模型比例估计这一监督学习问题中的模拟与真实数据,研究了其数值表现。