Despite its clinical utility, medical image segmentation (MIS) remains a daunting task due to images' inherent complexity and variability. Vision transformers (ViTs) have recently emerged as a promising solution to improve MIS; however, they require larger training datasets than convolutional neural networks. To overcome this obstacle, data-efficient ViTs were proposed, but they are typically trained using a single source of data, which overlooks the valuable knowledge that could be leveraged from other available datasets. Naivly combining datasets from different domains can result in negative knowledge transfer (NKT), i.e., a decrease in model performance on some domains with non-negligible inter-domain heterogeneity. In this paper, we propose MDViT, the first multi-domain ViT that includes domain adapters to mitigate data-hunger and combat NKT by adaptively exploiting knowledge in multiple small data resources (domains). Further, to enhance representation learning across domains, we integrate a mutual knowledge distillation paradigm that transfers knowledge between a universal network (spanning all the domains) and auxiliary domain-specific branches. Experiments on 4 skin lesion segmentation datasets show that MDViT outperforms state-of-the-art algorithms, with superior segmentation performance and a fixed model size, at inference time, even as more domains are added. Our code is available at https://github.com/siyi-wind/MDViT.
翻译:尽管医学图像分割(MIS)具有临床实用价值,但由于图像固有的复杂性和变异性,它仍然是一项艰巨的任务。视觉Transformer(ViT)近期成为改进MIS的有前景方案,但相较卷积神经网络,它们需要更大的训练数据集。为克服这一障碍,研究者提出了数据高效型ViT,但这些模型通常基于单一数据源进行训练,忽略了其他可用数据集中可资利用的宝贵知识。简单组合来自不同域的数据可能导致负知识迁移(NKT),即因非同域间不可忽略的异质性而导致模型在某些域上的性能下降。本文提出MDViT——首个包含域适配器的多域ViT,通过自适应利用多个小规模数据资源(域)中的知识来缓解数据饥渴并抑制NKT。此外,为增强跨域表示学习,我们整合了互知识蒸馏范式,在通用网络(覆盖所有域)与辅助域特定分支之间转移知识。在4个皮肤病变分割数据集上的实验表明,MDViT在推理时以固定模型规模超越现有最优算法,且随着域数量的增加仍保持卓越分割性能。我们的代码已开源至https://github.com/siyi-wind/MDViT。