Data association is at the core of many computer vision tasks, e.g., multiple object tracking, image matching, and point cloud registration. Existing methods usually solve the data association problem by network flow optimization, bipartite matching, or end-to-end learning directly. Despite their popularity, we find some defects of the current solutions: they mostly ignore the intra-view context information; besides, they either train deep association models in an end-to-end way and hardly utilize the advantage of optimization-based assignment methods, or only use an off-the-shelf neural network to extract features. In this paper, we propose a general learnable graph matching method to address these issues. Especially, we model the intra-view relationships as an undirected graph. Then data association turns into a general graph matching problem between graphs. Furthermore, to make optimization end-to-end differentiable, we relax the original graph matching problem into continuous quadratic programming and then incorporate training into a deep graph neural network with KKT conditions and implicit function theorem. In MOT task, our method achieves state-of-the-art performance on several MOT datasets. For image matching, our method outperforms state-of-the-art methods with half training data and iterations on a popular indoor dataset, ScanNet. Code will be available at https://github.com/jiaweihe1996/GMTracker.
翻译:数据关联是许多计算机视觉任务(例如多目标跟踪、图像匹配和点云配准)的核心。现有方法通常通过网络流优化、二分图匹配或端到端学习直接解决数据关联问题。尽管这些方法广受欢迎,但我们发现当前解决方案存在一些缺陷:它们大多忽略了视图内上下文信息;此外,它们要么以端到端方式训练深度关联模型而难以利用基于优化的分配方法的优势,要么仅使用现成的神经网络提取特征。本文提出了一种通用的可学习图匹配方法来解决这些问题。具体而言,我们将视图内关系建模为无向图,从而将数据关联转化为图之间的通用图匹配问题。此外,为使优化过程实现端到端可微,我们将原始图匹配问题松弛为连续二次规划,并利用KKT条件和隐函数定理将其训练融入深度图神经网络中。在多目标跟踪任务中,我们的方法在多个MOT数据集上取得了最先进的性能。在图像匹配任务中,该方法在流行的室内数据集ScanNet上仅使用一半的训练数据和迭代次数即超越了现有最先进方法。代码将发布于 https://github.com/jiaweihe1996/GMTracker。