Optimal transport (OT) is a general framework for finding a minimum-cost transport plan, or coupling, between probability distributions, and has many applications in machine learning. A key challenge in applying OT to massive datasets is the quadratic scaling of the coupling matrix with the size of the dataset. [Forrow et al. 2019] introduced a factored coupling for the k-Wasserstein barycenter problem, which [Scetbon et al. 2021] adapted to solve the primal low-rank OT problem. We derive an alternative parameterization of the low-rank problem based on the $\textit{latent coupling}$ (LC) factorization previously introduced by [Lin et al. 2021] generalizing [Forrow et al. 2019]. The LC factorization has multiple advantages for low-rank OT including decoupling the problem into three OT problems and greater flexibility and interpretability. We leverage these advantages to derive a new algorithm $\textit{Factor Relaxation with Latent Coupling}$ (FRLC), which uses $\textit{coordinate}$ mirror descent to compute the LC factorization. FRLC handles multiple OT objectives (Wasserstein, Gromov-Wasserstein, Fused Gromov-Wasserstein), and marginal constraints (balanced, unbalanced, and semi-relaxed) with linear space complexity. We provide theoretical results on FRLC, and demonstrate superior performance on diverse applications -- including graph clustering and spatial transcriptomics -- while demonstrating its interpretability.
翻译:最优传输(Optimal Transport, OT)是一个用于在概率分布间寻找最小成本传输方案(或称耦合)的通用框架,在机器学习中具有广泛的应用。将OT应用于大规模数据集时,一个关键挑战在于耦合矩阵随数据集规模呈二次方增长。[Forrow等人,2019]针对k-Wasserstein重心问题引入了一种因子化耦合,[Scetbon等人,2021]则将其调整用于求解原始低秩OT问题。我们基于[Lin等人,2021]先前引入的隐耦合(Latent Coupling, LC)分解(该分解推广了[Forrow等人,2019]的工作),推导出低秩问题的另一种参数化形式。LC分解对于低秩OT具有多重优势,包括将问题解耦为三个OT子问题,以及更高的灵活性和可解释性。我们利用这些优势,提出了一种新算法——基于隐耦合的因子松弛(Factor Relaxation with Latent Coupling, FRLC),该算法使用坐标镜像下降法来计算LC分解。FRLC能够处理多种OT目标(Wasserstein、Gromov-Wasserstein、Fused Gromov-Wasserstein)以及多种边际约束(平衡、非平衡及半松弛),并具有线性空间复杂度。我们提供了关于FRLC的理论分析,并在图聚类和空间转录组学等多样化应用中展示了其优越性能,同时验证了其良好的可解释性。