Recent approaches to point tracking are able to recover the trajectory of any scene point through a large portion of a video despite the presence of occlusions. They are, however, too slow in practice to track every point observed in a single frame in a reasonable amount of time. This paper introduces DOT, a novel, simple and efficient method for solving this problem. It first extracts a small set of tracks from key regions at motion boundaries using an off-the-shelf point tracking algorithm. Given source and target frames, DOT then computes rough initial estimates of a dense flow field and visibility mask through nearest-neighbor interpolation, before refining them using a learnable optical flow estimator that explicitly handles occlusions and can be trained on synthetic data with ground-truth correspondences. We show that DOT is significantly more accurate than current optical flow techniques, outperforms sophisticated "universal" trackers like OmniMotion, and is on par with, or better than, the best point tracking algorithms like CoTracker while being at least two orders of magnitude faster. Quantitative and qualitative experiments with synthetic and real videos validate the promise of the proposed approach. Code, data, and videos showcasing the capabilities of our approach are available in the project webpage: https://16lemoing.github.io/dot .
翻译:近期点跟踪方法能够在大段视频中(即使存在遮挡)恢复任意场景点的运动轨迹,但实际应用中跟踪单帧内所有可见点仍过于耗时。本文提出DOT——一种解决该问题的简洁高效新型方法。该方法首先利用现有点跟踪算法从运动边界关键区域提取少量轨迹,然后通过最近邻插值对源帧与目标帧之间的密集光流场及可见性掩膜进行粗糙初始估计,最后采用可学习的光流估计器(该模型能显式处理遮挡且可在合成数据上通过真实对应关系训练)进行精炼。实验表明:DOT在精度上显著超越当前光流技术,优于OmniMotion等先进"通用"跟踪器,在性能与CoTracker等最佳点跟踪算法相当或更优的同时,速度至少快两个数量级。在合成与真实视频上的定量定性实验验证了该方法的潜力。项目页面(https://16lemoing.github.io/dot)提供了相关代码、数据及展示成果的视频。