DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for Medical Image Analysis

Limited labeled data makes it hard to train models from scratch in medical domain, and an important paradigm is pre-training and then fine-tuning. Large pre-trained models contain rich representations, which can be adapted to downstream medical tasks. However, existing methods either tune all the parameters or the task-specific layers of the pre-trained models, ignoring the input variations of medical images, and thus they are not efficient or effective. In this work, we aim to study parameter-efficient fine-tuning (PEFT) for medical image analysis, and propose a dynamic visual prompt tuning method, named DVPT. It can extract knowledge beneficial to downstream tasks from large models with a few trainable parameters. Firstly, the frozen features are transformed by an lightweight bottleneck layer to learn the domain-specific distribution of downstream medical tasks, and then a few learnable visual prompts are used as dynamic queries and then conduct cross-attention with the transformed features, attempting to acquire sample-specific knowledge that are suitable for each sample. Finally, the features are projected to original feature dimension and aggregated with the frozen features. This DVPT module can be shared between different Transformer layers, further reducing the trainable parameters. To validate DVPT, we conduct extensive experiments with different pre-trained models on medical classification and segmentation tasks. We find such PEFT method can not only efficiently adapt the pre-trained models to the medical domain, but also brings data efficiency with partial labeled data. For example, with 0.5\% extra trainable parameters, our method not only outperforms state-of-the-art PEFT methods, even surpasses the full fine-tuning by more than 2.20\% Kappa score on medical classification task. It can saves up to 60\% labeled data and 99\% storage cost of ViT-B/16.

翻译：医学领域标注数据有限，使得从零训练模型困难重重，而预训练后微调是重要范式。大型预训练模型包含丰富的表征，可适配下游医学任务。然而，现有方法或微调全部参数、或仅调整任务特定层参数，忽略了医学图像的输入变化，导致效率与效果欠佳。本研究旨在探索面向医学图像分析的参数高效微调方法，提出动态视觉提示微调方法——DVPT。该方法可通过少量可训练参数从大型模型中提取对下游任务有益的知识。首先，利用轻量级瓶颈层对冻结特征进行变换，学习下游医学任务的领域特定分布；随后，以少量可学习视觉提示作为动态查询，与变换后的特征进行交叉注意力计算，试图获取适配每个样本的样本特定知识。最后，将特征投影回原始特征维度并与冻结特征聚合。该DVPT模块可在不同Transformer层间共享，进一步减少可训练参数。为验证DVPT，我们在医学分类与分割任务上采用不同预训练模型进行了广泛实验。研究表明，该参数高效微调方法不仅能高效适配预训练模型至医学领域，还可利用部分标注数据实现数据高效性。例如，仅使用0.5%额外可训练参数，本方法不仅超越了现有最优参数高效微调方法，在医学分类任务上甚至比全量微调高出超过2.20%的Kappa评分，同时可节省高达60%标注数据与99%的ViT-B/16存储成本。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日