Self-supervised representation learning (SSRL) has gained increasing attention in point cloud understanding, in addressing the challenges posed by 3D data scarcity and high annotation costs. This paper presents PCExpert, a novel SSRL approach that reinterprets point clouds as "specialized images". This conceptual shift allows PCExpert to leverage knowledge derived from large-scale image modality in a more direct and deeper manner, via extensively sharing the parameters with a pre-trained image encoder in a multi-way Transformer architecture. The parameter sharing strategy, combined with a novel pretext task for pre-training, i.e., transformation estimation, empowers PCExpert to outperform the state of the arts in a variety of tasks, with a remarkable reduction in the number of trainable parameters. Notably, PCExpert's performance under LINEAR fine-tuning (e.g., yielding a 90.02% overall accuracy on ScanObjectNN) has already approached the results obtained with FULL model fine-tuning (92.66%), demonstrating its effective and robust representation capability.
翻译:自监督表示学习(SSRL)在点云理解领域中日益受到重视,用以应对三维数据稀缺与高标注成本带来的挑战。本文提出PCExpert,一种创新的SSRL方法,将点云重新诠释为"专化图像"。这一概念转变使得PCExpert能够通过在多路Transformer架构中广泛共享预训练图像编码器的参数,以更直接、更深入的方式利用源自大规模图像模态的知识。该参数共享策略结合新颖的预训练前置任务(即变换估计),使PCExpert在多种任务中超越现有最优方法,同时显著减少可训练参数数量。值得注意的是,PCExpert在线性微调下的性能(例如在ScanObjectNN上达到90.02%的整体准确率)已接近全模型微调结果(92.66%),展现出高效且鲁棒的表示能力。