PMT-MAE: Dual-Branch Self-Supervised Learning with Distillation for Efficient Point Cloud Classification

Advances in self-supervised learning are essential for enhancing feature extraction and understanding in point cloud processing. This paper introduces PMT-MAE (Point MLP-Transformer Masked Autoencoder), a novel self-supervised learning framework for point cloud classification. PMT-MAE features a dual-branch architecture that integrates Transformer and MLP components to capture rich features. The Transformer branch leverages global self-attention for intricate feature interactions, while the parallel MLP branch processes tokens through shared fully connected layers, offering a complementary feature transformation pathway. A fusion mechanism then combines these features, enhancing the model's capacity to learn comprehensive 3D representations. Guided by the sophisticated teacher model Point-M2AE, PMT-MAE employs a distillation strategy that includes feature distillation during pre-training and logit distillation during fine-tuning, ensuring effective knowledge transfer. On the ModelNet40 classification task, achieving an accuracy of 93.6\% without employing voting strategy, PMT-MAE surpasses the baseline Point-MAE (93.2\%) and the teacher Point-M2AE (93.4\%), underscoring its ability to learn discriminative 3D point cloud representations. Additionally, this framework demonstrates high efficiency, requiring only 40 epochs for both pre-training and fine-tuning. PMT-MAE's effectiveness and efficiency render it well-suited for scenarios with limited computational resources, positioning it as a promising solution for practical point cloud analysis.

翻译：自监督学习的进展对于增强点云处理中的特征提取与理解至关重要。本文提出PMT-MAE（点MLP-Transformer掩码自编码器），一种用于点云分类的新型自监督学习框架。PMT-MAE采用双分支架构，集成Transformer与MLP组件以捕获丰富特征。其中Transformer分支利用全局自注意力机制实现精细的特征交互，而并行的MLP分支则通过共享的全连接层处理令牌，提供互补的特征转换路径。随后，融合机制将这两类特征结合，增强了模型学习全面三维表示的能力。在先进教师模型Point-M2AE的指导下，PMT-MAE采用蒸馏策略，包括预训练阶段的特征蒸馏与微调阶段的对数蒸馏，确保了有效的知识迁移。在ModelNet40分类任务上，未采用投票策略即达到93.6\%的准确率，PMT-MAE超越了基线方法Point-MAE（93.2\%）与教师模型Point-M2AE（93.4\%），彰显了其学习判别性三维点云表示的能力。此外，该框架展现出高效性，预训练与微调均仅需40个训练周期。PMT-MAE的有效性与高效性使其特别适用于计算资源有限的场景，为实际点云分析提供了具有前景的解决方案。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日