OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue, the first learnable image matcher that is designed with generalization as a core principle. OmniGlue leverages broad knowledge from a vision foundation model to guide the feature matching process, boosting generalization to domains not seen at training time. Additionally, we propose a novel keypoint position-guided attention mechanism which disentangles spatial and appearance information, leading to enhanced matching descriptors. We perform comprehensive experiments on a suite of $7$ datasets with varied image domains, including scene-level, object-centric and aerial images. OmniGlue's novel components lead to relative gains on unseen domains of $20.9\%$ with respect to a directly comparable reference model, while also outperforming the recent LightGlue method by $9.5\%$ relatively.Code and model can be found at https://hwjiang1510.github.io/OmniGlue

翻译：图像匹配领域持续涌现出新的可学习特征匹配技术，在传统基准测试中性能不断提升。然而，本研究表明，尽管有所改进，这些方法因对新型图像域的泛化能力有限，在实际应用中的潜力受到制约。本文提出OmniGlue，这是首个以泛化为核心原则设计的可学习图像匹配器。OmniGlue利用视觉基础模型的广泛知识引导特征匹配过程，提升对训练时未见域的泛化能力。此外，我们提出了一种新颖的关键点位置引导注意力机制，该机制将空间与外观信息解耦，从而增强匹配描述符。我们在涵盖场景级、对象中心及航空图像的7个不同图像域数据集上进行了全面实验。OmniGlue的创新组件在未见域上相较于直接可比的参考模型取得了20.9%的相对提升，同时比近期提出的LightGlue方法相对提高了9.5%。代码与模型详见https://hwjiang1510.github.io/OmniGlue。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日