Segmenting object parts such as cup handles and animal bodies is important in many real-world applications but requires more annotation effort. The largest dataset nowadays contains merely two hundred object categories, implying the difficulty to scale up part segmentation to an unconstrained setting. To address this, we propose to explore a seemingly simplified but empirically useful and scalable task, class-agnostic part segmentation. In this problem, we disregard the part class labels in training and instead treat all of them as a single part class. We argue and demonstrate that models trained without part classes can better localize parts and segment them on objects unseen in training. We then present two further improvements. First, we propose to make the model object-aware, leveraging the fact that parts are "compositions", whose extents are bounded by the corresponding objects and whose appearances are by nature not independent but bundled. Second, we introduce a novel approach to improve part segmentation on unseen objects, inspired by an interesting finding -- for unseen objects, the pixel-wise features extracted by the model often reveal high-quality part segments. To this end, we propose a novel self-supervised procedure that iterates between pixel clustering and supervised contrastive learning that pulls pixels closer or pushes them away. Via extensive experiments on PartImageNet and Pascal-Part, we show notable and consistent gains by our approach, essentially a critical step towards open-world part segmentation.
翻译:对物体部件(如杯柄和动物身体)进行分割在许多实际应用中具有重要意义,但需要大量标注工作。目前最大的数据集仅包含两百个物体类别,这反映出将部件分割扩展到无约束场景的难度。为解决这一问题,我们提出探索一项看似简化但在实践中有效且可扩展的任务:类别无关的部件分割。在该问题中,我们在训练中忽略部件类别标签,而将所有部件视为单一类别。我们论证并展示了:未经部件类别训练的模型能更好地定位部件,并在训练中未见的物体上实现分割。随后我们提出两项改进。首先,我们提出使模型具备物体感知能力,利用部件作为“组合”的特性——其范围受对应物体约束,且外观本质上非独立而是捆绑的。其次,我们受一项有趣发现启发,提出一种新方法以改进未见物体上的部件分割:对未见物体而言,模型提取的像素级特征通常能揭示高质量的部件分割。为此,我们提出一种新颖的自监督流程,在像素聚类和有监督对比学习之间迭代,使像素相互拉近或推远。通过在PartImageNet和Pascal-Part上的大量实验,我们的方法取得了显著且一致的改进,本质上是迈向开放世界部件分割的关键一步。