Dense image correspondence is central to many applications, such as visual odometry, 3D reconstruction, object association, and re-identification. Historically, dense correspondence has been tackled separately for wide-baseline scenarios and optical flow estimation, despite the common goal of matching content between two images. In this paper, we develop a Unified Flow & Matching model (UFM), which is trained on unified data for pixels that are co-visible in both source and target images. UFM uses a simple, generic transformer architecture that directly regresses the (u,v) flow. It is easier to train and more accurate for large flows compared to the typical coarse-to-fine cost volumes in prior work. UFM is 28% more accurate than state-of-the-art flow methods (Unimatch), while also having 62% less error and 6.7x faster than dense wide-baseline matchers (RoMa). UFM is the first to demonstrate that unified training can outperform specialized approaches across both domains. This result enables fast, general-purpose correspondence and opens new directions for multi-modal, long-range, and real-time correspondence tasks.
翻译:稠密图像对应关系是视觉里程计、三维重建、物体关联与重识别等众多应用的核心。尽管匹配两幅图像内容的目标相同,稠密对应问题历来被分别处理为宽基线场景与光流估计两类任务。本文提出一种统一流与匹配模型(UFM),该模型在源图像与目标图像中共同可见的像素上进行统一数据训练。UFM采用简洁通用的Transformer架构,直接回归(u,v)流场。相较于先前工作中典型的由粗到细代价体方法,本模型更易于训练,且对大位移流场具有更高精度。UFM在精度上超越当前最先进的光流方法(Unimatch)达28%,同时相较于稠密宽基线匹配器(RoMa)误差降低62%、速度提升6.7倍。UFM首次证明统一训练能够在这两个领域同时超越专用方法。这一成果为实现快速通用对应关系开辟了道路,并为多模态、长距离及实时对应任务指明了新方向。