Existing 3D instance segmentation methods are predominated by the bottom-up design -- manually fine-tuned algorithm to group points into clusters followed by a refinement network. However, by relying on the quality of the clusters, these methods generate susceptible results when (1) nearby objects with the same semantic class are packed together, or (2) large objects with loosely connected regions. To address these limitations, we introduce ISBNet, a novel cluster-free method that represents instances as kernels and decodes instance masks via dynamic convolution. To efficiently generate high-recall and discriminative kernels, we propose a simple strategy named Instance-aware Farthest Point Sampling to sample candidates and leverage the local aggregation layer inspired by PointNet++ to encode candidate features. Moreover, we show that predicting and leveraging the 3D axis-aligned bounding boxes in the dynamic convolution further boosts performance. Our method set new state-of-the-art results on ScanNetV2 (55.9), S3DIS (60.8), and STPLS3D (49.2) in terms of AP and retains fast inference time (237ms per scene on ScanNetV2).
翻译:现有的三维实例分割方法主要采用自底向上的设计——通过手动微调的算法将点聚类成簇,再辅以精化网络。然而,这些方法依赖聚类质量,当(1)具有相同语义类别的邻近物体紧密堆积,或(2)大物体存在松散连接区域时,会产生不稳定的结果。为解决这些局限,我们提出ISBNet,一种新颖的无聚类方法,将实例表示为核,并通过动态卷积解码实例掩码。为高效生成高召回率且具有区分性的核,我们提出简单策略——实例感知最远点采样来筛选候选点,并利用受PointNet++启发的局部聚合层编码候选特征。此外,我们证明在动态卷积中预测并利用三维轴对齐边界框可进一步提升性能。我们的方法在ScanNetV2(55.9)、S3DIS(60.8)和STPLS3D(49.2)数据集上以AP指标创下新纪录,同时保持快速推理速度(ScanNetV2上每场景237毫秒)。