Optimal transportation is a fundamental topic that has attracted a great amount of attention from machine learning community in the past decades. In this paper, we consider an interesting discrete dynamic optimal transport problem: can we efficiently update the optimal transport plan when the weights or the locations of the data points change? This problem is naturally motivated by several applications in machine learning. For example, we often need to compute the optimal transportation cost between two different data sets; if some change happens to a few data points, should we re-compute the high complexity cost function or update the cost by some efficient dynamic data structure? We are aware that several dynamic maximum flow algorithms have been proposed before, however, the research on dynamic minimum cost flow problem is still quite limited, to the best of our knowledge. We propose a novel 2D Skip Orthogonal List together with some dynamic tree techniques. Although our algorithm is based on the conventional simplex method, it can efficiently complete each pivoting operation within $O(|V|)$ time with high probability where $V$ is the set of all supply and demand nodes. Since dynamic modifications typically do not introduce significant changes, our algorithm requires only a few simplex iterations in practice. So our algorithm is more efficient than re-computing the optimal transportation cost that needs at least one traversal over all the $O(|E|) = O(|V|^2)$ variables in general cases. Our experiments demonstrate that our algorithm significantly outperforms existing algorithms in the dynamic scenarios.
翻译:最优传输是一个基础性课题,在过去几十年中吸引了机器学习领域的广泛关注。本文考虑一个有趣的离散动态最优传输问题:当数据点的权重或位置发生变化时,能否高效更新最优传输方案?该问题自然源于机器学习中的若干应用场景。例如,我们常需计算两个不同数据集之间的最优传输代价;若某几个数据点发生变动,是应该重新计算高复杂度的代价函数,还是借助某种高效动态数据结构来更新代价?据我们所知,尽管已有多种动态最大流算法被提出,但关于动态最小代价流问题的研究仍相当有限。我们提出了一种新颖的二维跳跃正交列表,并结合了多种动态树技术。虽然我们的算法基于传统单纯形法,但能高概率地在$O(|V|)$时间内完成每次枢轴操作,其中$V$为所有供应节点和需求节点的集合。由于动态修改通常不会引入显著变化,实际应用中我们的算法仅需少数几次单纯形迭代。因此,该算法比重新计算最优传输代价更高效,因为后者在一般情况下至少需要遍历所有$O(|E|) = O(|V|^2)$个变量。实验表明,在动态场景下我们的算法显著优于现有算法。