Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models

Pre-trained point cloud models have found extensive applications in 3D understanding tasks like object classification and part segmentation. However, the prevailing strategy of full fine-tuning in downstream tasks leads to large per-task storage overhead for model parameters, which limits the efficiency when applying large-scale pre-trained models. Inspired by the recent success of visual prompt tuning (VPT), this paper attempts to explore prompt tuning on pre-trained point cloud models, to pursue an elegant balance between performance and parameter efficiency. We find while instance-agnostic static prompting, e.g. VPT, shows some efficacy in downstream transfer, it is vulnerable to the distribution diversity caused by various types of noises in real-world point cloud data. To conquer this limitation, we propose a novel Instance-aware Dynamic Prompt Tuning (IDPT) strategy for pre-trained point cloud models. The essence of IDPT is to develop a dynamic prompt generation module to perceive semantic prior features of each point cloud instance and generate adaptive prompt tokens to enhance the model's robustness. Notably, extensive experiments demonstrate that IDPT outperforms full fine-tuning in most tasks with a mere 7% of the trainable parameters, providing a promising solution to parameter-efficient learning for pre-trained point cloud models. Code is available at \url{https://github.com/zyh16143998882/ICCV23-IDPT}.

翻译：预训练点云模型在三维理解任务（如目标分类与部件分割）中已获得广泛应用。然而，下游任务中普遍采用的全参数微调策略会导致模型参数的每任务存储开销过大，从而限制了大规模预训练模型的应用效率。受视觉提示调优（VPT）近期成功的启发，本文尝试探索预训练点云模型的提示调优方法，以期在性能与参数效率之间实现优雅平衡。研究发现，VPT这类实例无关的静态提示方法在下游迁移中虽有一定效果，但易受真实点云数据中各类噪声导致的分布多样性影响。为克服这一局限，我们提出面向预训练点云模型的实例感知动态提示调优（IDPT）策略。IDPT的核心在于构建动态提示生成模块，通过感知每个点云实例的语义先验特征并生成自适应提示令牌，从而增强模型的鲁棒性。值得注意的是，大量实验表明，IDPT在多数任务中仅需7%的可训练参数即可超越全参数微调性能，为预训练点云模型的参数高效学习提供了富有前景的解决方案。代码已开源至 \url{https://github.com/zyh16143998882/ICCV23-IDPT}。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日