Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization

Optimal Transport (OT) based distances are powerful tools for machine learning to compare probability measures and manipulate them using OT maps. In this field, a setting of interest is semi-discrete OT, where the source measure $\mu$ is continuous, while the target $\nu$ is discrete. Recent works have shown that the minimax rate for the OT map is $\mathcal{O}(t^{-1/2})$ when using $t$ i.i.d. subsamples from each measure (two-sample setting). An open question is whether a better convergence rate can be achieved when the full information of the discrete measure $\nu$ is known (one-sample setting). In this work, we answer positively to this question by (i) proving an $\mathcal{O}(t^{-1})$ lower bound rate for the OT map, using the similarity between Laguerre cells estimation and density support estimation, and (ii) proposing a Stochastic Gradient Descent (SGD) algorithm with adaptive entropic regularization and averaging acceleration. To nearly achieve the desired fast rate, characteristic of non-regular parametric problems, we design an entropic regularization scheme decreasing with the number of samples. Another key step in our algorithm consists of using a projection step that permits to leverage the local strong convexity of the regularized OT problem. Our convergence analysis integrates online convex optimization and stochastic gradient techniques, complemented by the specificities of the OT semi-dual. Moreover, while being as computationally and memory efficient as vanilla SGD, our algorithm achieves the unusual fast rates of our theory in numerical experiments.

翻译：最优传输（OT）距离是机器学习中用于比较概率测度并借助OT映射对其进行处理的有力工具。在该领域中，一个备受关注的设定是半离散OT，其中源测度$\mu$是连续的，而目标测度$\nu$是离散的。近期研究表明，当从每个测度中抽取$t$个独立同分布子样本（双样本设定）时，OT映射的极小极大收敛率为$\mathcal{O}(t^{-1/2})$。一个悬而未决的问题是：当已知离散测度$\nu$的全部信息时（单样本设定），能否获得更优的收敛率？本文通过以下两方面对此问题给出了肯定回答：（i）利用拉盖尔胞元估计与密度支撑估计之间的相似性，证明了OT映射的$\mathcal{O}(t^{-1})$下界速率；（ii）提出了一种结合自适应熵正则化与平均加速的随机梯度下降（SGD）算法。为逼近非正则参数问题特有的快速收敛率，我们设计了随样本数增加而递减的熵正则化方案。算法的另一个关键步骤在于引入投影操作，从而利用正则化OT问题的局部强凸性。我们的收敛性分析融合了在线凸优化与随机梯度技术，并结合了OT半对偶问题的特性。此外，尽管在计算和内存效率上与经典SGD相当，我们的算法在数值实验中实现了理论预测的罕见快速收敛速率。