We present a novel method for efficiently producing semi-dense matches across images. Previous detector-free matcher LoFTR has shown remarkable matching capability in handling large-viewpoint change and texture-poor scenarios but suffers from low efficiency. We revisit its design choices and derive multiple improvements for both efficiency and accuracy. One key observation is that performing the transformer over the entire feature map is redundant due to shared local information, therefore we propose an aggregated attention mechanism with adaptive token selection for efficiency. Furthermore, we find spatial variance exists in LoFTR's fine correlation module, which is adverse to matching accuracy. A novel two-stage correlation layer is proposed to achieve accurate subpixel correspondences for accuracy improvement. Our efficiency optimized model is $\sim 2.5\times$ faster than LoFTR which can even surpass state-of-the-art efficient sparse matching pipeline SuperPoint + LightGlue. Moreover, extensive experiments show that our method can achieve higher accuracy compared with competitive semi-dense matchers, with considerable efficiency benefits. This opens up exciting prospects for large-scale or latency-sensitive applications such as image retrieval and 3D reconstruction. Project page: https://zju3dv.github.io/efficientloftr.
翻译:我们提出了一种新颖的方法,用于高效地生成图像间的半密集匹配。先前的无检测器匹配器LoFTR在处理大视角变化和纹理匮乏场景时展现了卓越的匹配能力,但效率较低。我们重新审视其设计选择,并针对效率和准确性提出了多项改进。一个关键观察是,由于共享局部信息,在整个特征图上执行Transformer存在冗余,因此我们提出了一种具有自适应令牌选择的聚合注意力机制以提高效率。此外,我们发现LoFTR的精细相关模块中存在空间方差,这对匹配准确性不利。我们提出了一种新颖的两阶段相关层,以实现精确的亚像素对应关系,从而提高准确性。经效率优化的模型速度比LoFTR快约2.5倍,甚至超越了最先进的稀疏匹配流水线SuperPoint + LightGlue。大量实验表明,与竞争性半密集匹配器相比,我们的方法在实现显著效率优势的同时,能达到更高的准确性。这为大规模或延迟敏感应用(如图像检索和三维重建)开辟了令人兴奋的前景。项目页面:https://zju3dv.github.io/efficientloftr。