Self-supervised Learning of Rotation-invariant 3D Point Set Features using Transformer and its Self-distillation

Invariance against rotations of 3D objects is an important property in analyzing 3D point set data. Conventional 3D point set DNNs having rotation invariance typically obtain accurate 3D shape features via supervised learning by using labeled 3D point sets as training samples. However, due to the rapid increase in 3D point set data and the high cost of labeling, a framework to learn rotation-invariant 3D shape features from numerous unlabeled 3D point sets is required. This paper proposes a novel self-supervised learning framework for acquiring accurate and rotation-invariant 3D point set features at object-level. Our proposed lightweight DNN architecture decomposes an input 3D point set into multiple global-scale regions, called tokens, that preserve the spatial layout of partial shapes composing the 3D object. We employ a self-attention mechanism to refine the tokens and aggregate them into an expressive rotation-invariant feature per 3D point set. Our DNN is effectively trained by using pseudo-labels generated by a self-distillation framework. To facilitate the learning of accurate features, we propose to combine multi-crop and cut-mix data augmentation techniques to diversify 3D point sets for training. Through a comprehensive evaluation, we empirically demonstrate that, (1) existing rotation-invariant DNN architectures designed for supervised learning do not necessarily learn accurate 3D shape features under a self-supervised learning scenario, and (2) our proposed algorithm learns rotation-invariant 3D point set features that are more accurate than those learned by existing algorithms. Code will be available at https://github.com/takahikof/RIPT_SDMM

翻译：旋转不变性是分析三维点集数据时的重要性质。传统具有旋转不变性的三维点集深度神经网络通常通过使用带标签的三维点集作为训练样本进行监督学习，从而获得准确的三维形状特征。然而，随着三维点集数据的快速增长和标注成本的高昂，亟需一种能够从大量无标签三维点集中学习旋转不变三维形状特征的框架。本文提出一种新颖的自监督学习框架，用于在对象级别获取准确且旋转不变的三维点集特征。我们提出的轻量级深度神经网络架构将输入的三维点集分解为多个全局尺度区域（称为令牌），这些令牌保留了构成三维对象的局部形状的空间布局。我们采用自注意力机制来优化这些令牌，并将它们聚合成每个三维点集的具有表达能力且旋转不变的特征。我们的深度神经网络通过自蒸馏框架生成的伪标签进行有效训练。为促进准确特征的学习，我们提出结合多裁剪和剪切混合数据增强技术，以多样化用于训练的三维点集。通过全面评估，我们实证表明：（1）现有为监督学习设计的旋转不变深度神经网络架构在自监督学习场景下不一定能学习到准确的三维形状特征；（2）我们提出的算法学习到的旋转不变三维点集特征比现有算法更准确。代码将开源在https://github.com/takahikof/RIPT_SDMM