Separating an image into meaningful underlying components is a crucial first step for both editing and understanding images. We present a method capable of selecting the regions of a photograph exhibiting the same material as an artist-chosen area. Our proposed approach is robust to shading, specular highlights, and cast shadows, enabling selection in real images. As we do not rely on semantic segmentation (different woods or metal should not be selected together), we formulate the problem as a similarity-based grouping problem based on a user-provided image location. In particular, we propose to leverage the unsupervised DINO features coupled with a proposed Cross-Similarity module and an MLP head to extract material similarities in an image. We train our model on a new synthetic image dataset, that we release. We show that our method generalizes well to real-world images. We carefully analyze our model's behavior on varying material properties and lighting. Additionally, we evaluate it against a hand-annotated benchmark of 50 real photographs. We further demonstrate our model on a set of applications, including material editing, in-video selection, and retrieval of object photographs with similar materials.
翻译:将图像分解为有意义的底层组成部分是图像编辑和理解的关键第一步。我们提出了一种方法,能够选择照片中与艺术家所选区域具有相同材料的区域。我们的方法对阴影、高光反射和投射阴影具有鲁棒性,从而能够对真实图像进行选择。由于我们不依赖于语义分割(例如不同木材或金属不应被一起选择),我们将此问题定义为基于用户指定图像位置的相似性分组问题。具体而言,我们提出利用无监督DINO特征,结合所提出的跨相似性模块和MLP头部,来提取图像中的材料相似性。我们在一个新发布的合成图像数据集上训练模型。实验表明,我们的方法能够很好地泛化到真实世界的图像。我们详细分析了模型在不同材料属性和光照条件下的行为。此外,我们在一个包含50张真实照片的手工标注基准上进行了评估。我们进一步展示了模型在一系列应用中的表现,包括材料编辑、视频内选择以及检索具有相似材料的物体照片。