We study optimal transport (OT) problem for probability measures supported on a tree metric space. It is known that such OT problem (i.e., tree-Wasserstein (TW)) admits a closed-form expression, but depends fundamentally on the underlying tree structure over supports of input measures. In practice, the given tree structure may be, however, perturbed due to noisy or adversarial measurements. In order to mitigate this issue, we follow the max-min robust OT approach which considers the maximal possible distances between two input measures over an uncertainty set of tree metrics. In general, this approach is hard to compute, even for measures supported in $1$-dimensional space, due to its non-convexity and non-smoothness which hinders its practical applications, especially for large-scale settings. In this work, we propose \emph{novel uncertainty sets of tree metrics} from the lens of edge deletion/addition which covers a diversity of tree structures in an elegant framework. Consequently, by building upon the proposed uncertainty sets, and leveraging the tree structure over supports, we show that the max-min robust OT also admits a closed-form expression for a fast computation as its counterpart standard OT (i.e., TW). Furthermore, we demonstrate that the max-min robust OT satisfies the metric property and is negative definite. We then exploit its negative definiteness to propose \emph{positive definite kernels} and test them in several simulations on various real-world datasets on document classification and topological data analysis for measures with noisy tree metric.
翻译:我们研究了树度量空间上概率测度的最优传输(Optimal Transport, OT)问题。已知该问题(即树-沃瑟斯坦距离,Tree-Wasserstein, TW)具有闭式表达式,但本质上依赖于输入测度支撑集的底层树结构。然而在实际应用中,给定的树结构可能因噪声或对抗性测量而受到扰动。为解决此问题,我们采用最大-最小鲁棒OT方法,该方法考虑了树度量不确定性集上两个输入测度之间的最大可能距离。通常,由于该问题的非凸性和非光滑性,即使对于一维空间上的测度也难以计算,这阻碍了其实际应用,尤其是在大规模场景中。本文中,我们从边删除/添加的角度提出了新颖的树度量不确定性集,以优雅的框架涵盖了多样化的树结构。进而,基于所提出的不确定性集并利用支撑集上的树结构,我们证明了最大-最小鲁棒OT同样具有闭式表达式,可实现快速计算,其效率与标准OT(即TW)相当。此外,我们证明了最大-最小鲁棒OT满足度量性质且是负定的。基于其负定性,我们进一步提出了正定核,并在多个真实世界数据集上进行了仿真验证,涉及带噪声树度量的文档分类和拓扑数据分析任务。