Despite its clinical utility, medical image segmentation (MIS) remains a daunting task due to images' inherent complexity and variability. Vision transformers (ViTs) have recently emerged as a promising solution to improve MIS; however, they require larger training datasets than convolutional neural networks. To overcome this obstacle, data-efficient ViTs were proposed, but they are typically trained using a single source of data, which overlooks the valuable knowledge that could be leveraged from other available datasets. Naivly combining datasets from different domains can result in negative knowledge transfer (NKT), i.e., a decrease in model performance on some domains with non-negligible inter-domain heterogeneity. In this paper, we propose MDViT, the first multi-domain ViT that includes domain adapters to mitigate data-hunger and combat NKT by adaptively exploiting knowledge in multiple small data resources (domains). Further, to enhance representation learning across domains, we integrate a mutual knowledge distillation paradigm that transfers knowledge between a universal network (spanning all the domains) and auxiliary domain-specific branches. Experiments on 4 skin lesion segmentation datasets show that MDViT outperforms state-of-the-art algorithms, with superior segmentation performance and a fixed model size, at inference time, even as more domains are added. Our code is available at https://github.com/siyi-wind/MDViT.
翻译:尽管具有临床实用性,但由于图像固有的复杂性和变异性,医学图像分割(MIS)仍是一项艰巨任务。近年来,视觉Transformer(ViTs)作为改进MIS的潜在解决方案崭露头角,但它们需要比卷积神经网络更大的训练数据集。为克服这一障碍,研究者提出了数据高效的ViTs,但这些模型通常仅使用单一数据源进行训练,忽略了其他可用数据集可能提供的宝贵知识。若将不同领域的数据简单组合,可能导致负知识迁移(NKT),即由于领域间不可忽略的异质性,某些领域的模型性能下降。本文提出MDViT——首个包含领域适配器的多域ViT,其通过自适应利用多个小型数据资源(领域)中的知识来缓解数据饥渴问题并抑制NKT。此外,为增强跨领域的表征学习,我们融合了互知识蒸馏范式,在通用网络(覆盖所有领域)与辅助领域特定分支之间传递知识。在4个皮肤病变分割数据集上的实验表明,MDViT在推理时保持固定模型尺寸的情况下,分割性能优于现有最先进算法,且随着领域数量增加仍能保持优势。我们的代码开源于https://github.com/siyi-wind/MDViT。