We consider statistical learning problems in which data are observed as a set of probability measures. Optimal transport (OT) is a popular tool to compare and manipulate such objects, but its computational cost becomes prohibitive when the measures have large support. We study a quantization-based approach in which all input measures are approximated by $K$-point discrete measures sharing a common support. We establish consistency of the resulting quantized measures. We further derive convergence guarantees for several OT-based downstream tasks computed from the quantized measures. Numerical experiments on synthetic and real datasets demonstrate that the proposed approach achieves performance comparable to individual quantization while substantially reducing runtime.
翻译:我们考虑数据以概率测度集合形式存在的统计学习问题。最优运输(OT)是比较和处理这类对象的常用工具,但当测度具有大支撑集时,其计算成本变得难以承受。我们研究一种基于量化的方法,其中所有输入测度均由共享共同支撑集的$K$点离散测度近似。我们证明了所得量化测度的一致性。进一步推导了基于量化测度计算的若干OT下游任务的收敛保证。在合成和真实数据集上的数值实验表明,所提方法在显著降低运行时间的同时,实现了与单独量化相当的性能。