Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention. While mainly focusing on architectural modifications, most existing hybrid approaches still use conventional data-independent weight initialization schemes which restrict their performance due to ignoring the inherent volumetric nature of the medical data. To address this issue, we propose a learnable weight initialization approach that utilizes the available medical training data to effectively learn the contextual and structural cues via the proposed self-supervised objectives. Our approach is easy to integrate into any hybrid model and requires no external training data. Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach, leading to state-of-the-art segmentation performance. Our proposed data-dependent initialization approach performs favorably as compared to the Swin-UNETR model pretrained using large-scale datasets on multi-organ segmentation task. Our source code and models are available at: https://github.com/ShahinaKK/LWI-VMS.
翻译:混合型三维医学图像分割模型结合了局部卷积与全局注意力的优势,近年来受到广泛关注。然而,现有混合方法主要聚焦于架构改进,其权重初始化方案仍沿用传统的数据无关策略,由于忽略了医学数据固有的三维特性,这限制了模型性能。为解决这一问题,我们提出一种可学习的权重初始化方法,通过设计的自监督目标,有效利用现有医学训练数据学习上下文与结构线索。该方法易于集成至任意混合模型,且无需外部训练数据。在多器官与肺癌分割任务上的实验表明,该方法可取得领先的分割性能。与使用大规模数据集预训练的Swin-UNETR模型相比,我们提出的数据依赖初始化方法在多器官分割任务上表现更优。源代码与模型已开源至:https://github.com/ShahinaKK/LWI-VMS。