Natural image matting algorithms aim to predict the transparency map (alpha-matte) with the trimap guidance. However, the production of trimaps often requires significant labor, which limits the widespread application of matting algorithms on a large scale. To address the issue, we propose Matte Anything model (MatAny), an interactive natural image matting model which could produce high-quality alpha-matte with various simple hints. The key insight of MatAny is to generate pseudo trimap automatically with contour and transparency prediction. We leverage task-specific vision models to enhance the performance of natural image matting. Specifically, we use the segment anything model (SAM) to predict high-quality contour with user interaction and an open-vocabulary (OV) detector to predict the transparency of any object. Subsequently, a pretrained image matting model generates alpha mattes with pseudo trimaps. MatAny is the interactive matting algorithm with the most supported interaction methods and the best performance to date. It consists of orthogonal vision models without any additional training. We evaluate the performance of MatAny against several current image matting algorithms, and the results demonstrate the significant potential of our approach.
翻译:自然图像抠图算法旨在根据三分图引导预测透明度图(alpha-matte)。然而,三分图的制作往往需要大量人工,这限制了抠图算法大规模应用的可行性。为解决这一问题,我们提出万物可抠模型(MatAny),这是一种交互式自然图像抠图模型,能够通过多种简单提示生成高质量的alpha-matte。其核心思路是通过轮廓预测和透明度预测自动生成伪三分图。我们利用特定任务视觉模型来增强自然图像抠图的性能。具体而言,使用Segment Anything模型(SAM)通过用户交互预测高质量轮廓,并采用开放词汇检测器预测任意物体的透明度。随后,预训练的抠图模型利用伪三分图生成alpha-matte。MatAny是目前支持交互方式最多、性能最优的交互式抠图算法。它由正交的视觉模型组成,无需任何额外训练。我们将MatAny与当前多种图像抠图算法的性能进行了对比评估,实验结果证明了我们方法的显著潜力。