Optimal transport (OT) has emerged as a powerful framework to compare probability measures, a fundamental task in many statistical and machine learning problems. Substantial advances have been made over the last decade in designing OT variants which are either computationally and statistically more efficient, or more robust to the measures and datasets to compare. Among them, sliced OT distances have been extensively used to mitigate optimal transport's cubic algorithmic complexity and curse of dimensionality. In parallel, unbalanced OT was designed to allow comparisons of more general positive measures, while being more robust to outliers. In this paper, we propose to combine these two concepts, namely slicing and unbalanced OT, to develop a general framework for efficiently comparing positive measures. We propose two new loss functions based on the idea of slicing unbalanced OT, and study their induced topology and statistical properties. We then develop a fast Frank-Wolfe-type algorithm to compute these loss functions, and show that the resulting methodology is modular as it encompasses and extends prior related work. We finally conduct an empirical analysis of our loss functions and methodology on both synthetic and real datasets, to illustrate their relevance and applicability.
翻译:最优传输(OT)已成为比较概率测度的强大框架,这是许多统计与机器学习问题中的基础任务。过去十年中,学界在OT变体的设计上取得了显著进展,这些变体或具有更高的计算和统计效率,或在对比较的测度与数据集时更具鲁棒性。其中,切片OT距离被广泛用于缓解最优传输的三次方算法复杂度与维度灾难问题。与此同时,非平衡OT被设计用于比较更一般的正测度,同时对异常值具有更强鲁棒性。本文提出将切片与非平衡OT这两个概念相结合,构建一个高效比较正测度的通用框架。基于非平衡OT切片的思想,我们提出两种新的损失函数,并研究其诱导拓扑结构与统计性质。随后,我们开发了一种快速Frank-Wolfe型算法来计算这些损失函数,并证明该方法具有模块化特性——它涵盖并扩展了先前相关工作。最后,我们在合成数据集与真实数据集上对所提损失函数和方法进行实证分析,以阐明其相关性与适用性。