Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization

Optimal Transport (OT) based distances are powerful tools for machine learning to compare probability measures and manipulate them using OT maps. In this field, a setting of interest is semi-discrete OT, where the source measure $\mu$ is continuous, while the target $\nu$ is discrete. Recent works have shown that the minimax rate for the OT map is $\mathcal{O}(t^{-1/2})$ when using $t$ i.i.d. subsamples from each measure (two-sample setting). An open question is whether a better convergence rate can be achieved when the full information of the discrete measure $\nu$ is known (one-sample setting). In this work, we answer positively to this question by (i) proving an $\mathcal{O}(t^{-1})$ lower bound rate for the OT map, using the similarity between Laguerre cells estimation and density support estimation, and (ii) proposing a Stochastic Gradient Descent (SGD) algorithm with adaptive entropic regularization and averaging acceleration. To nearly achieve the desired fast rate, characteristic of non-regular parametric problems, we design an entropic regularization scheme decreasing with the number of samples. Another key step in our algorithm consists of using a projection step that permits to leverage the local strong convexity of the regularized OT problem. Our convergence analysis integrates online convex optimization and stochastic gradient techniques, complemented by the specificities of the OT semi-dual. Moreover, while being as computationally and memory efficient as vanilla SGD, our algorithm achieves the unusual fast rates of our theory in numerical experiments.

翻译：最优传输（OT）距离是机器学习中比较概率测度并利用OT映射对其进行处理的强大工具。在该领域中，一个重要的设定是半离散OT，其中源测度$\mu$是连续的，而目标测度$\nu$是离散的。近期研究表明，当从每个测度中使用$t$个独立同分布子样本（双样本设定）时，OT映射的极小极大收敛率为$\mathcal{O}(t^{-1/2})$。一个悬而未决的问题是，当已知离散测度$\nu$的全部信息（单样本设定）时，能否获得更优的收敛率。本工作通过以下两方面对此问题给出了肯定回答：（i）利用拉盖尔胞元估计与密度支撑估计之间的相似性，证明了OT映射的$\mathcal{O}(t^{-1})$下界速率；（ii）提出了一种结合自适应熵正则化与平均加速的随机梯度下降（SGD）算法。为近乎达到非正则参数问题特有的快速收敛率，我们设计了一种随样本数增加而递减的熵正则化方案。算法的另一个关键步骤是引入投影操作，以利用正则化OT问题的局部强凸性。我们的收敛性分析融合了在线凸优化与随机梯度技术，并结合了OT半对偶问题的特性。此外，尽管在计算和内存效率上与经典SGD相当，我们的算法在数值实验中实现了理论预测的罕见快速收敛速率。