Optimal transportation is a fundamental topic that has attracted a great amount of attention from machine learning community in the past decades. In this paper, we consider an interesting discrete dynamic optimal transport problem: can we efficiently update the optimal transport plan when the weights or the locations of the data points change? This problem is naturally motivated by several applications in machine learning. For example, we often need to compute the optimal transportation cost between two different data sets; if some change happens to a few data points, should we re-compute the high complexity cost function or update the cost by some efficient dynamic data structure? We are aware that several dynamic maximum flow algorithms have been proposed before, however, the research on dynamic minimum cost flow problem is still quite limited, to the best of our knowledge. We propose a novel 2D Skip Orthogonal List together with some dynamic tree techniques. Although our algorithm is based on the conventional simplex method, it can efficiently complete each pivoting operation within $O(|V|)$ time with high probability where $V$ is the set of all supply and demand nodes. Since dynamic modifications typically do not introduce significant changes, our algorithm requires only a few simplex iterations in practice. So our algorithm is more efficient than re-computing the optimal transportation cost that needs at least one traversal over all the $O(|E|) = O(|V|^2)$ variables in general cases. Our experiments demonstrate that our algorithm significantly outperforms existing algorithms in the dynamic scenarios.
翻译:最优传输是过去几十年机器学习领域广受关注的基础性课题。本文考虑一个有趣的离散动态最优传输问题:当数据点的权重或位置发生变化时,能否高效更新最优传输方案?这一问题自然源于机器学习的若干应用场景。例如,当需要计算两个不同数据集之间的最优传输成本时,若少数数据点发生改变,我们是应重新计算高复杂度的成本函数,还是借助高效的动态数据结构进行更新?尽管已有多种动态最大流算法被提出,但据我们所知,关于动态最小费用流问题的研究仍十分有限。我们提出了一种新型二维跳跃正交列表,并融合了动态树技术。虽然该算法基于传统单纯形法,但其每次枢轴运算能以高概率在$O(|V|)$时间内完成(其中$V$为所有供需节点集合)。由于动态修改通常不会引起显著变化,实际应用中该算法仅需少量单纯形迭代。因此,相较于在一般情形下至少需遍历全部$O(|E|) = O(|V|^2)$个变量才能重新计算最优传输成本的方案,本算法更具效率。实验表明,在动态场景下本算法显著优于现有算法。