Optimal transport aligns samples across distributions by minimizing the transportation cost between them, e.g., the geometric distances. Yet, it ignores coherence structure in the data such as clusters, does not handle outliers well, and cannot integrate new data points. To address these drawbacks, we propose InfoOT, an information-theoretic extension of optimal transport that maximizes the mutual information between domains while minimizing geometric distances. The resulting objective can still be formulated as a (generalized) optimal transport problem, and can be efficiently solved by projected gradient descent. This formulation yields a new projection method that is robust to outliers and generalizes to unseen samples. Empirically, InfoOT improves the quality of alignments across benchmarks in domain adaptation, cross-domain retrieval, and single-cell alignment.
翻译:最优传输通过最小化分布之间的运输成本(例如几何距离)来对齐样本。然而,它忽略了数据中的聚类等结构信息,不能有效处理异常值,也无法整合新数据点。为解决这些缺陷,我们提出InfoOT,一种基于信息论的最优传输扩展方法,它在最小化几何距离的同时最大化域之间的互信息。最终目标仍可表述为(广义)最优传输问题,并可通过投影梯度下降高效求解。这一公式推导出一种对异常值鲁棒且能泛化至未见样本的新投影方法。实验结果表明,InfoOT在领域自适应、跨域检索和单细胞对齐等基准测试中提升了对齐质量。