In this work, we present MFTIQ, a novel dense long-term tracking model that advances the Multi-Flow Tracker (MFT) framework to address challenges in point-level visual tracking in video sequences. MFTIQ builds upon the flow-chaining concepts of MFT, integrating an Independent Quality (IQ) module that separates correspondence quality estimation from optical flow computations. This decoupling significantly enhances the accuracy and flexibility of the tracking process, allowing MFTIQ to maintain reliable trajectory predictions even in scenarios of prolonged occlusions and complex dynamics. Designed to be "plug-and-play", MFTIQ can be employed with any off-the-shelf optical flow method without the need for fine-tuning or architectural modifications. Experimental validations on the TAP-Vid Davis dataset show that MFTIQ with RoMa optical flow not only surpasses MFT but also performs comparably to state-of-the-art trackers while having substantially faster processing speed. Code and models available at https://github.com/serycjon/MFTIQ .
翻译:本文提出MFTIQ——一种新颖的密集长时跟踪模型,该模型在Multi-Flow Tracker(MFT)框架基础上进行改进,以应对视频序列中点级视觉跟踪的挑战。MFTIQ基于MFT的光流链式关联思想,集成了独立质量评估模块,该模块将对应关系质量评估与光流计算过程解耦。这种解耦显著提升了跟踪过程的准确性与灵活性,使得MFTIQ能够在长时间遮挡和复杂动态场景下仍保持可靠的轨迹预测。MFTIQ采用“即插即用”设计理念,可与任何现成的光流方法直接配合使用,无需微调或架构修改。在TAP-Vid Davis数据集上的实验验证表明,采用RoMa光流的MFTIQ不仅超越了原始MFT,其性能与最先进的跟踪器相当,同时具有显著更快的处理速度。代码与模型发布于https://github.com/serycjon/MFTIQ。