Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

Powered by large-scale pre-training, vision foundation models exhibit significant potential in open-world image understanding. Even though individual models have limited capabilities, combining multiple such models properly can lead to positive synergies and unleash their full potential. In this work, we present Matcher, which segments anything with one shot by integrating an all-purpose feature extraction model and a class-agnostic segmentation model. Naively connecting the models results in unsatisfying performance, e.g., the models tend to generate matching outliers and false-positive mask fragments. To address these issues, we design a bidirectional matching strategy for accurate cross-image semantic dense matching and a robust prompt sampler for mask proposal generation. In addition, we propose a novel instance-level matching strategy for controllable mask merging. The proposed Matcher method delivers impressive generalization performance across various segmentation tasks, all without training. For example, it achieves 52.7% mIoU on COCO-20$^i$ for one-shot semantic segmentation, surpassing the state-of-the-art specialist model by 1.6%. In addition, our visualization results show open-world generality and flexibility on images in the wild. The code shall be released at https://github.com/aim-uofa/Matcher.

翻译：受大规模预训练驱动，视觉基础模型在开放世界图像理解中展现出显著潜力。尽管单个模型能力有限，但合理组合多个此类模型可产生协同效应并释放其全部潜力。本文提出Matcher方法，通过集成通用特征提取模型与类别无关分割模型，实现一次性任意目标分割。直接连接这些模型会导致性能不佳，例如易产生匹配离群点和假阳性掩膜碎片。针对这些问题，我们设计了双向匹配策略以实现精确的跨图像语义密集匹配，并开发了鲁棒提示采样器用于掩膜提案生成。此外，我们提出了一种新颖的实例级匹配策略以实现可控掩膜融合。所提Matcher方法在无需训练的条件下，在各种分割任务中展现出令人印象深刻的泛化性能。例如，在COCO-20$^i$数据集的一次性语义分割任务中达到52.7%的mIoU，超越当前最先进的专用模型1.6%。同时，可视化结果揭示了该方法在自然图像上的开放世界通用性与灵活性。代码将于https://github.com/aim-uofa/Matcher 开源。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

图像分割二十年，盘点影响力最大的10篇论文

专知会员服务

45+阅读 · 2022年2月7日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日