Towards holistic understanding of 3D scenes, a general 3D segmentation method is needed that can segment diverse objects without restrictions on object quantity or categories, while also reflecting the inherent hierarchical structure. To achieve this, we propose OmniSeg3D, an omniversal segmentation method aims for segmenting anything in 3D all at once. The key insight is to lift multi-view inconsistent 2D segmentations into a consistent 3D feature field through a hierarchical contrastive learning framework, which is accomplished by two steps. Firstly, we design a novel hierarchical representation based on category-agnostic 2D segmentations to model the multi-level relationship among pixels. Secondly, image features rendered from the 3D feature field are clustered at different levels, which can be further drawn closer or pushed apart according to the hierarchical relationship between different levels. In tackling the challenges posed by inconsistent 2D segmentations, this framework yields a global consistent 3D feature field, which further enables hierarchical segmentation, multi-object selection, and global discretization. Extensive experiments demonstrate the effectiveness of our method on high-quality 3D segmentation and accurate hierarchical structure understanding. A graphical user interface further facilitates flexible interaction for omniversal 3D segmentation.
翻译:为达成对三维场景的全局理解,需要一种通用的三维分割方法,既能不受对象数量或类别限制地分割多样物体,又能反映其固有的层次结构。为此,我们提出OmniSeg3D——一种旨在一次性分割三维空间中所有物体的普适分割方法。其核心思路是通过层次化对比学习框架,将多视角下不一致的二维分割结果提升为一致的三维特征场,具体通过两步实现:首先,我们基于类别无关的二维分割设计了一种新型层次化表示,用于建模像素间的多层级关系;其次,从三维特征场渲染出的图像特征会在不同层级进行聚类,并根据层级间的层次关系进一步拉近或推远。在应对不一致二维分割带来的挑战时,该框架生成了全局一致的三维特征场,进而实现了层次化分割、多对象选择与全局离散化。大量实验证明了该方法在高品质三维分割与准确层次结构理解上的有效性,同时配套的图形用户界面为普适三维分割提供了灵活交互支持。