Gromov-Wasserstein distance has found many applications in machine learning due to its ability to compare measures across metric spaces and its invariance to isometric transformations. However, in certain applications, this invariance property can be too flexible, thus undesirable. Moreover, the Gromov-Wasserstein distance solely considers pairwise sample similarities in input datasets, disregarding the raw feature representations. We propose a new optimal transport-based distance, called Augmented Gromov-Wasserstein, that allows for some control over the level of rigidity to transformations. It also incorporates feature alignments, enabling us to better leverage prior knowledge on the input data for improved performance. We present theoretical insights into the proposed metric. We then demonstrate its usefulness for single-cell multi-omic alignment tasks and a transfer learning scenario in machine learning.
翻译:格罗莫夫-瓦瑟斯坦距离因其在度量空间上比较测度的能力以及对等距变换的不变性,已在机器学习领域获得广泛应用。然而,在某些应用场景中,这种不变性特性可能过于灵活,因而并非理想。此外,格罗莫夫-瓦瑟斯坦距离仅考虑输入数据集中样本间的成对相似性,而忽略了原始特征表示。我们提出一种基于最优传输的新型距离——增强型格罗莫夫-瓦瑟斯坦距离,该距离能够对变换的刚性程度进行一定程度的控制,同时融合特征对齐,从而更有效地利用输入数据的先验知识以提升性能。我们从理论层面剖析了所提度量的特性,进而展示了其在单细胞多组学比对任务以及机器学习迁移学习场景中的实用价值。