Multi-Level Global Context Cross Consistency Model for Semi-Supervised Ultrasound Image Segmentation with Diffusion Model

Medical image segmentation is a critical step in computer-aided diagnosis, and convolutional neural networks are popular segmentation networks nowadays. However, the inherent local operation characteristics make it difficult to focus on the global contextual information of lesions with different positions, shapes, and sizes. Semi-supervised learning can be used to learn from both labeled and unlabeled samples, alleviating the burden of manual labeling. However, obtaining a large number of unlabeled images in medical scenarios remains challenging. To address these issues, we propose a Multi-level Global Context Cross-consistency (MGCC) framework that uses images generated by a Latent Diffusion Model (LDM) as unlabeled images for semi-supervised learning. The framework involves of two stages. In the first stage, a LDM is used to generate synthetic medical images, which reduces the workload of data annotation and addresses privacy concerns associated with collecting medical data. In the second stage, varying levels of global context noise perturbation are added to the input of the auxiliary decoder, and output consistency is maintained between decoders to improve the representation ability. Experiments conducted on open-source breast ultrasound and private thyroid ultrasound datasets demonstrate the effectiveness of our framework in bridging the probability distribution and the semantic representation of the medical image. Our approach enables the effective transfer of probability distribution knowledge to the segmentation network, resulting in improved segmentation accuracy. The code is available at https://github.com/FengheTan9/Multi-Level-Global-Context-Cross-Consistency.

翻译：医学图像分割是计算机辅助诊断的关键步骤，卷积神经网络是当前主流的图像分割网络。然而，其固有的局部操作特性使其难以关注不同位置、形状和大小的病灶的全局上下文信息。半监督学习可通过同时利用有标注和无标注样本进行学习，减轻人工标注的负担。但在医疗场景中获取大量无标注图像仍具挑战性。为解决这些问题，我们提出了一种多级全局上下文交叉一致性（MGCC）框架，该框架利用潜在扩散模型（LDM）生成的图像作为无标注图像进行半监督学习。该框架包含两个阶段：第一阶段使用LDM生成合成医学图像，这既减少了数据标注的工作量，又解决了医疗数据采集中的隐私问题；第二阶段在辅助解码器的输入中添加不同级别的全局上下文噪声扰动，并维持解码器之间的输出一致性以增强表示能力。在开源乳腺超声和私有甲状腺超声数据集上的实验表明，我们的框架能有效弥合医学图像的概率分布与语义表征之间的差距。该方法使概率分布知识能够有效迁移至分割网络，从而提升分割精度。代码开源地址：https://github.com/FengheTan9/Multi-Level-Global-Context-Cross-Consistency。