Accurate estimation of large displacement optical flow remains a critical challenge. Existing methods typically rely on iterative local search or/and domain-specific fine-tuning, which severely limits their performance in large displacement and zero-shot generalization scenarios. To overcome this, we introduce MegaFlow, a simple yet powerful model for zero-shot large displacement optical flow. Rather than relying on highly complex, task-specific architectural designs, MegaFlow adapts powerful pre-trained vision priors to produce temporally consistent motion fields. In particular, we formulate flow estimation as a global matching problem by leveraging pre-trained global Vision Transformer features, which naturally capture large displacements. This is followed by a few lightweight iterative refinements to further improve the sub-pixel accuracy. Extensive experiments demonstrate that MegaFlow achieves state-of-the-art zero-shot performance across multiple optical flow benchmarks. Moreover, our model also delivers highly competitive zero-shot performance on long-range point tracking benchmarks, demonstrating its robust transferability and suggesting a unified paradigm for generalizable motion estimation. Our project page is at: https://kristen-z.github.io/projects/megaflow.
翻译:大位移光流的精确估计仍是一项关键挑战。现有方法通常依赖迭代局部搜索和/或领域特定的微调,这严重限制了其在零样本泛化和大位移场景下的性能。为此,我们提出了MegaFlow——一种简洁而强大的零样本大位移光流模型。MegaFlow不依赖高度复杂、任务特定的架构设计,而是通过适配强大的预训练视觉先验来生成时间一致的运动场。具体而言,我们利用预训练的全局Vision Transformer特征将光流估计建模为全局匹配问题,该特征天然能够捕捉大位移。随后通过少量轻量级迭代优化进一步提升亚像素精度。大量实验表明,MegaFlow在多个光流基准上达到了零样本的最优性能。此外,我们的模型在长程点跟踪基准上也展现出极具竞争力的零样本性能,验证了其强大的可迁移性,并暗示了一种可泛化运动估计的统一范式。项目主页:https://kristen-z.github.io/projects/megaflow。