Optimal Transport is a useful metric to compare probability distributions and to compute a pairing given a ground cost. Its entropic regularization variant (eOT) is crucial to have fast algorithms and reflect fuzzy/noisy matchings. This work focuses on Inverse Optimal Transport (iOT), the problem of inferring the ground cost from samples drawn from a coupling that solves an eOT problem. It is a relevant problem that can be used to infer unobserved/missing links, and to obtain meaningful information about the structure of the ground cost yielding the pairing. On one side, iOT benefits from convexity, but on the other side, being ill-posed, it requires regularization to handle the sampling noise. This work presents an in-depth theoretical study of the l1 regularization to model for instance Euclidean costs with sparse interactions between features. Specifically, we derive a sufficient condition for the robust recovery of the sparsity of the ground cost that can be seen as a far reaching generalization of the Lasso's celebrated Irrepresentability Condition. To provide additional insight into this condition, we work out in detail the Gaussian case. We show that as the entropic penalty varies, the iOT problem interpolates between a graphical Lasso and a classical Lasso, thereby establishing a connection between iOT and graph estimation, an important problem in ML.
翻译:最优传输是一种有用的度量,用于比较概率分布并在给定基础代价下计算配对。其熵正则化变体(eOT)对于实现快速算法和反映模糊/噪声匹配至关重要。本文聚焦于逆最优传输(iOT)问题,即从求解eOT问题的耦合采样中推断基础代价。这是一个相关的问题,可用于推断未观测/缺失的链接,并获取关于生成配对的基础代价结构的有意义信息。一方面,iOT受益于凸性,但另一方面,由于不适定性,它需要正则化来处理采样噪声。本文对l1正则化(例如,对具有特征间稀疏交互的欧几里得代价建模)进行了深入的理论研究。具体地,我们推导了基础代价稀疏性稳健恢复的充分条件,这可以视为Lasso著名的不可表示性条件的深远推广。为了进一步理解这一条件,我们详细推导了高斯情况。我们证明,随着熵惩罚的变化,iOT问题在图Lasso和经典Lasso之间插值,从而建立了iOT与图估计(机器学习中的一个重要问题)之间的联系。