The common practice in developing computer-aided diagnosis (CAD) models based on transformer architectures usually involves fine-tuning from ImageNet pre-trained weights. However, with recent advances in large-scale pre-training and the practice of scaling laws, Vision Transformers (ViT) have become much larger and less accessible to medical imaging communities. Additionally, in real-world scenarios, the deployments of multiple CAD models can be troublesome due to problems such as limited storage space and time-consuming model switching. To address these challenges, we propose a new method MeLo (Medical image Low-rank adaptation), which enables the development of a single CAD model for multiple clinical tasks in a lightweight manner. It adopts low-rank adaptation instead of resource-demanding fine-tuning. By fixing the weight of ViT models and only adding small low-rank plug-ins, we achieve competitive results on various diagnosis tasks across different imaging modalities using only a few trainable parameters. Specifically, our proposed method achieves comparable performance to fully fine-tuned ViT models on four distinct medical imaging datasets using about 0.17% trainable parameters. Moreover, MeLo adds only about 0.5MB of storage space and allows for extremely fast model switching in deployment and inference. Our source code and pre-trained weights are available on our website (https://absterzhu.github.io/melo.github.io/).
翻译:基于Transformer架构开发计算机辅助诊断(CAD)模型的常见做法通常是从ImageNet预训练权重进行微调。然而,随着大规模预训练的最新进展和规模定律的实践,视觉Transformer(ViT)模型变得日益庞大,导致医学影像社区难以获取。此外,在真实场景中,受限于存储空间不足和模型切换耗时等问题,多个CAD模型的部署可能带来诸多不便。针对上述挑战,我们提出了一种新方法MeLo(医学影像低秩适配),该方法能够以轻量级方式为多种临床任务开发单一CAD模型。它采用低秩适配替代资源密集型的微调策略。通过固定ViT模型权重并仅添加小型低秩插件,我们在不同成像模态的多种诊断任务中仅使用少量可训练参数便取得了具有竞争力的结果。具体而言,我们提出的方法在四个不同医学影像数据集上使用约0.17%的可训练参数即可实现与完全微调ViT模型相当的性能。此外,MeLo仅需增加约0.5MB的存储空间,并在部署和推理过程中支持极快的模型切换。我们的源代码和预训练权重已发布在官方网站上(https://absterzhu.github.io/melo.github.io/)。