Multi-Level Global Context Cross Consistency Model for Semi-Supervised Ultrasound Image Segmentation with Diffusion Model

Medical image segmentation is a critical step in computer-aided diagnosis, and convolutional neural networks are popular segmentation networks nowadays. However, the inherent local operation characteristics make it difficult to focus on the global contextual information of lesions with different positions, shapes, and sizes. Semi-supervised learning can be used to learn from both labeled and unlabeled samples, alleviating the burden of manual labeling. However, obtaining a large number of unlabeled images in medical scenarios remains challenging. To address these issues, we propose a Multi-level Global Context Cross-consistency (MGCC) framework that uses images generated by a Latent Diffusion Model (LDM) as unlabeled images for semi-supervised learning. The framework involves of two stages. In the first stage, a LDM is used to generate synthetic medical images, which reduces the workload of data annotation and addresses privacy concerns associated with collecting medical data. In the second stage, varying levels of global context noise perturbation are added to the input of the auxiliary decoder, and output consistency is maintained between decoders to improve the representation ability. Experiments conducted on open-source breast ultrasound and private thyroid ultrasound datasets demonstrate the effectiveness of our framework in bridging the probability distribution and the semantic representation of the medical image. Our approach enables the effective transfer of probability distribution knowledge to the segmentation network, resulting in improved segmentation accuracy. The code is available at https://github.com/FengheTan9/Multi-Level Global-Context-Cross-Consistency.

翻译：医学图像分割是计算机辅助诊断中的关键步骤，卷积神经网络是目前流行的分割网络。然而，其固有的局部操作特性使其难以关注不同位置、形状和大小的病灶的全局上下文信息。半监督学习可同时利用有标签和无标签样本进行学习，减轻人工标注的负担。但在医疗场景中获取大量无标签图像仍具挑战性。为解决这些问题，我们提出了一种多级全局上下文交叉一致性（MGCC）框架，该框架利用潜在扩散模型（LDM）生成的图像作为半监督学习的无标签图像。该框架包含两个阶段：第一阶段使用LDM生成合成医学图像，可减少数据标注工作量并解决医学数据采集中的隐私问题；第二阶段在辅助解码器输入中加入不同级别的全局上下文噪声扰动，并通过保持解码器间的输出一致性来提升表示能力。在开源乳腺超声和私有甲状腺超声数据集上的实验表明，该框架能有效弥合医学图像的概率分布与语义表示之间的差距。我们的方法可将概率分布知识高效迁移至分割网络，从而提升分割精度。代码已开源：https://github.com/FengheTan9/Multi-Level Global-Context-Cross-Consistency。