Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method to mitigate such challenges. LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy to capture both long-range and short-range feature dependencies adeptly. This is in contrast to existing methods that rely on increasing network capacity to enhance feature extraction. This combination of novel techniques in our model is especially beneficial in medical image segmentation, given the difficulty of learning intricate and often irregular body organ shapes, such as the spleen. Complementary, we propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets. The method combines masking and contrastive learning techniques within a multi-task learning framework and is compatible with both Vision Transformer (ViT) and CNN-based models. We demonstrate the efficacy of our methods in numerous tasks across two standard datasets (i.e., BTCV and MSD). Benchmark comparisons with eight state-of-the-art models highlight LoGoNet's superior performance in both inference time and accuracy.
翻译:标准现代基于机器学习的成像方法因数据集构建成本高昂导致标注训练数据有限,在医学应用中面临挑战。此外,这些方法在部署后通常需每日处理海量数据,给医疗机构带来高昂维护成本。本文提出一种新型神经网络架构LoGoNet,并配套设计自监督学习(SSL)方法以缓解上述挑战。LoGoNet在U形架构中集成新型特征提取器,利用大核注意力(LKA)机制与双编码策略,巧妙捕获长程与短程特征依赖关系。这与依赖增加网络容量来增强特征提取的现有方法形成鲜明对比。这种创新技术组合尤其有益于医学图像分割,有助于解决脾脏等复杂不规则器官形状的学习难题。作为补充,我们提出适用于三维图像的新型自监督学习方法,以弥补大规模标注数据集缺失问题。该方法在多任务学习框架内融合掩码与对比学习技术,兼容Vision Transformer(ViT)和基于CNN的模型。我们在两个标准数据集(BTCV和MSD)的多个任务中验证了方法的有效性。与八种先进模型的基准对比表明,LoGoNet在推理时间和精度方面均展现出优越性能。