Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers

Semantic pattern of an object point cloud is determined by its topological configuration of local geometries. Learning discriminative representations can be challenging due to large shape variations of point sets in local regions and incomplete surface in a global perspective, which can be made even more severe in the context of unsupervised domain adaptation (UDA). In specific, traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries, which greatly limits their cross-domain generalization. Recently, the transformer-based models have achieved impressive performance gain in a range of image-based tasks, benefiting from its strong generalization capability and scalability stemming from capturing long range correlation across local patches. Inspired by such successes of visual transformers, we propose a novel Relational Priors Distillation (RPD) method to extract relational priors from the well-trained transformers on massive images, which can significantly empower cross-domain representations with consistent topological priors of objects. To this end, we establish a parameter-frozen pre-trained transformer module shared between 2D teacher and 3D student models, complemented by an online knowledge distillation strategy for semantically regularizing the 3D student model. Furthermore, we introduce a novel self-supervised task centered on reconstructing masked point cloud patches using corresponding masked multi-view image features, thereby empowering the model with incorporating 3D geometric information. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification. The source code of this work is available at https://github.com/zou-longkun/RPD.git.

翻译：物体点云的语义模式由其局部几何结构的拓扑配置决定。由于局部区域点集形状变化大以及全局视角下表面不完整，学习判别性表示具有挑战性，这一挑战在无监督域适应（UDA）背景下更为严峻。具体而言，传统3D网络主要关注局部几何细节而忽略局部几何间的拓扑结构，这极大限制了其跨域泛化能力。近期，基于Transformer的模型凭借其捕获局部图像块间长程相关性的强大泛化能力与可扩展性，在一系列基于图像的任务中取得了显著性能提升。受视觉Transformer成功经验的启发，我们提出一种新颖的关系先验蒸馏（RPD）方法，从在大规模图像上预训练的Transformer中提取关系先验，从而通过物体一致的拓扑先验显著增强跨域表示能力。为此，我们构建了一个参数冻结的预训练Transformer模块，供2D教师模型与3D学生模型共享，并辅以在线知识蒸馏策略对3D学生模型进行语义正则化。此外，我们引入一项基于掩码点云块重建的新型自监督任务，利用对应的掩码多视角图像特征进行重建，从而使模型能够融合3D几何信息。在PointDA-10和Sim-to-Real数据集上的实验表明，所提方法在点云分类的UDA任务中持续取得最先进的性能。本工作的源代码公开于https://github.com/zou-longkun/RPD.git。