We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before $\alpha$ blending their color. Following this example, we train a model to include also a segmentation feature vector for each Gaussian. These can then be used for 3D scene segmentation, by clustering Gaussians according to their feature vectors; and to generate 2D segmentation masks, by projecting the Gaussians on a plane and $\alpha$ blending over their segmentation features. Using a combination of contrastive learning and spatial regularization, our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views. Moreover, the resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by $+8\%$ over the state of the art. Code and trained models will be released soon.
翻译:我们提出对比高斯聚类(Contrastive Gaussian Clustering)这一新颖方法,能够从任意视角生成分割掩码,并实现场景的三维分割。近期在新视角合成领域的研究表明,可通过三维高斯点云对场景外观进行建模,并通过将高斯体投影到平面并对其颜色进行α混合,从给定视角生成精确图像。受此启发,我们训练模型为每个高斯体额外包含一个分割特征向量。通过根据特征向量对高斯体进行聚类,可进行三维场景分割;通过将高斯体投影到平面并对其分割特征进行α混合,可生成二维分割掩码。结合对比学习与空间正则化,我们的方法可在不一致的二维分割掩码上训练,仍能学习生成全视角一致的分割掩码。此外,该模型具有极高的精度,在预测掩码的交并比(IoU)精度上比现有最优方法提升8%。代码与训练模型将很快开源。