Optimal transport is a fundamental topic that has attracted a great amount of attention from the optimization community in the past decades. In this paper, we consider an interesting discrete dynamic optimal transport problem: can we efficiently update the optimal transport plan when the weights or the locations of the data points change? This problem is naturally motivated by several applications in machine learning. For example, we often need to compute the optimal transport cost between two different data sets; if some changes happen to a few data points, should we re-compute the high complexity cost function or update the cost by some efficient dynamic data structure? We are aware that several dynamic maximum flow algorithms have been proposed before, however, the research on dynamic minimum cost flow problem is still quite limited, to the best of our knowledge. We propose a novel 2D Skip Orthogonal List together with some dynamic tree techniques. Although our algorithm is based on the conventional simplex method, it can efficiently find the variable to pivot within expected $O(1)$ time, and complete each pivoting operation within expected $O(|V|)$ time where $V$ is the set of all supply and demand nodes. Since dynamic modifications typically do not introduce significant changes, our algorithm requires only a few simplex iterations in practice. So our algorithm is more efficient than re-computing the optimal transport cost that needs at least one traversal over all $|E| = O(|V|^2)$ variables, where $|E|$ denotes the number of edges in the network. Our experiments demonstrate that our algorithm significantly outperforms existing algorithms in the dynamic scenarios.
翻译:最优传输是优化领域在过去几十年中备受关注的基础课题。本文考虑一个有趣的离散动态最优传输问题:当数据点的权重或位置发生变化时,能否高效更新最优传输方案?该问题源于机器学习中的若干应用场景。例如,我们常需计算两个不同数据集之间的最优传输代价;若少量数据点发生变化,是应当重新计算高复杂度的代价函数,还是通过某种高效的动态数据结构更新代价?据我们所知,尽管已有多种动态最大流算法被提出,但动态最小费用流问题的研究仍相当有限。我们提出了一种结合动态树技术的新型二维跳跃正交列表。虽然算法基于经典单纯形法,但能在期望的$O(1)$时间内高效找到枢轴变量,并在期望的$O(|V|)$时间内完成每次枢轴操作(其中$V$为所有供需节点集合)。由于动态修改通常不会引入显著变化,该算法在实践中仅需少量单纯形迭代。因此,其效率优于需要至少遍历所有$|E| = O(|V|^2)$个变量的传统重新计算方法($|E|$表示网络中的边数)。实验表明,该算法在动态场景下显著优于现有方法。