A common challenge in aggregating data from multiple sources can be formalized as an \textit{Optimal Transport} (OT) barycenter problem, which seeks to compute the average of probability distributions with respect to OT discrepancies. However, the presence of outliers and noise in the data measures can significantly hinder the performance of traditional statistical methods for estimating OT barycenters. To address this issue, we propose a novel, scalable approach for estimating the \textit{robust} continuous barycenter, leveraging the dual formulation of the \textit{(semi-)unbalanced} OT problem. To the best of our knowledge, this paper is the first attempt to develop an algorithm for robust barycenters under the continuous distribution setup. Our method is framed as a $\min$-$\max$ optimization problem and is adaptable to \textit{general} cost function. We rigorously establish the theoretical underpinnings of the proposed method and demonstrate its robustness to outliers and class imbalance through a number of illustrative experiments.
翻译:聚合多源数据时的一个常见挑战可形式化为一个\textit{最优传输}(OT)重心问题,其目标是在OT差异的意义下计算概率分布的平均。然而,数据测度中存在的异常值和噪声会显著阻碍传统统计方法估计OT重心的性能。为解决此问题,我们提出了一种新颖、可扩展的方法来估计\textit{鲁棒}连续重心,该方法利用了\textit{(半)不平衡}OT问题的对偶形式。据我们所知,本文是首次尝试在连续分布设定下为鲁棒重心开发算法。我们的方法被构建为一个$\min$-$\max$优化问题,并能适应\textit{一般}成本函数。我们严格建立了所提方法的理论基础,并通过一系列说明性实验证明了其对异常值和类别不平衡的鲁棒性。