Recent approaches to point tracking are able to recover the trajectory of any scene point through a large portion of a video despite the presence of occlusions. They are, however, too slow in practice to track every point observed in a single frame in a reasonable amount of time. This paper introduces DOT, a novel, simple and efficient method for solving this problem. It first extracts a small set of tracks from key regions at motion boundaries using an off-the-shelf point tracking algorithm. Given source and target frames, DOT then computes rough initial estimates of a dense flow field and visibility mask through nearest-neighbor interpolation, before refining them using a learnable optical flow estimator that explicitly handles occlusions and can be trained on synthetic data with ground-truth correspondences. We show that DOT is significantly more accurate than current optical flow techniques, outperforms sophisticated "universal" trackers like OmniMotion, and is on par with, or better than, the best point tracking algorithms like CoTracker while being at least two orders of magnitude faster. Quantitative and qualitative experiments with synthetic and real videos validate the promise of the proposed approach. Code, data, and videos showcasing the capabilities of our approach are available in the project webpage: https://16lemoing.github.io/dot .
翻译:近期点跟踪方法能在视频片段中恢复任意场景点的轨迹,即便存在遮挡也能实现。然而,它们在实践中处理单帧中所有可见点的计算速度过慢。本文提出DOT——一种新颖、简洁且高效的解决方案。该方法首先利用现有点跟踪算法在运动边界的关键区域提取少量轨迹,随后通过最近邻插值为源帧与目标帧生成稠密光流场与可见性掩膜的粗略初始估计,并借助可学习的光流估计器(显式处理遮挡、可基于合成数据与真实对应关系训练)进行精化。实验表明,DOT在精度上显著优于现有光流技术,超过OmniMotion等复杂的"通用"跟踪器,与CoTracker等最优算法持平甚至更优,同时速度提升至少两个数量级。在合成与真实视频上的定量定性实验验证了该方法的潜力。相关代码、数据及效果演示视频见项目主页:https://16lemoing.github.io/dot。