Limited labeled data makes it hard to train models from scratch in medical domain, and an important paradigm is pre-training and then fine-tuning. Large pre-trained models contain rich representations, which can be adapted to downstream medical tasks. However, existing methods either tune all the parameters or the task-specific layers of the pre-trained models, ignoring the input variations of medical images, and thus they are not efficient or effective. In this work, we aim to study parameter-efficient fine-tuning (PEFT) for medical image analysis, and propose a dynamic visual prompt tuning method, named DVPT. It can extract knowledge beneficial to downstream tasks from large models with a few trainable parameters. Firstly, the frozen features are transformed by an lightweight bottleneck layer to learn the domain-specific distribution of downstream medical tasks, and then a few learnable visual prompts are used as dynamic queries and then conduct cross-attention with the transformed features, attempting to acquire sample-specific knowledge that are suitable for each sample. Finally, the features are projected to original feature dimension and aggregated with the frozen features. This DVPT module can be shared between different Transformer layers, further reducing the trainable parameters. To validate DVPT, we conduct extensive experiments with different pre-trained models on medical classification and segmentation tasks. We find such PEFT method can not only efficiently adapt the pre-trained models to the medical domain, but also brings data efficiency with partial labeled data. For example, with 0.5\% extra trainable parameters, our method not only outperforms state-of-the-art PEFT methods, even surpasses the full fine-tuning by more than 2.20\% Kappa score on medical classification task. It can saves up to 60\% labeled data and 99\% storage cost of ViT-B/16.
翻译:医学领域标注数据有限,使得从零训练模型困难重重,而预训练后微调是重要范式。大型预训练模型包含丰富的表征,可适配下游医学任务。然而,现有方法或微调全部参数、或仅调整任务特定层参数,忽略了医学图像的输入变化,导致效率与效果欠佳。本研究旨在探索面向医学图像分析的参数高效微调方法,提出动态视觉提示微调方法——DVPT。该方法可通过少量可训练参数从大型模型中提取对下游任务有益的知识。首先,利用轻量级瓶颈层对冻结特征进行变换,学习下游医学任务的领域特定分布;随后,以少量可学习视觉提示作为动态查询,与变换后的特征进行交叉注意力计算,试图获取适配每个样本的样本特定知识。最后,将特征投影回原始特征维度并与冻结特征聚合。该DVPT模块可在不同Transformer层间共享,进一步减少可训练参数。为验证DVPT,我们在医学分类与分割任务上采用不同预训练模型进行了广泛实验。研究表明,该参数高效微调方法不仅能高效适配预训练模型至医学领域,还可利用部分标注数据实现数据高效性。例如,仅使用0.5%额外可训练参数,本方法不仅超越了现有最优参数高效微调方法,在医学分类任务上甚至比全量微调高出超过2.20%的Kappa评分,同时可节省高达60%标注数据与99%的ViT-B/16存储成本。