Mammography is crucial for breast cancer surveillance and early diagnosis. However, analyzing mammography images is a demanding task for radiologists, who often review hundreds of mammograms daily, leading to overdiagnosis and overtreatment. Computer-Aided Diagnosis (CAD) systems have been developed to assist in this process, but their capabilities, particularly in lesion segmentation, remained limited. With the contemporary advances in deep learning their performance may be improved. Recently, vision-language diffusion models emerged, demonstrating outstanding performance in image generation and transferability to various downstream tasks. We aim to harness their capabilities for breast lesion segmentation in a panoptic setting, which encompasses both semantic and instance-level predictions. Specifically, we propose leveraging pretrained features from a Stable Diffusion model as inputs to a state-of-the-art panoptic segmentation architecture, resulting in accurate delineation of individual breast lesions. To bridge the gap between natural and medical imaging domains, we incorporated a mammography-specific MAM-E diffusion model and BiomedCLIP image and text encoders into this framework. We evaluated our approach on two recently published mammography datasets, CDD-CESM and VinDr-Mammo. For the instance segmentation task, we noted 40.25 AP0.1 and 46.82 AP0.05, as well as 25.44 PQ0.1 and 26.92 PQ0.05. For the semantic segmentation task, we achieved Dice scores of 38.86 and 40.92, respectively.
翻译:乳腺X线摄影对乳腺癌监测与早期诊断至关重要。然而,乳腺X线影像分析对放射科医师是项繁重任务,其日常需审阅数百张乳腺X线片,易导致过度诊断与过度治疗。计算机辅助诊断系统虽已开发用于协助此过程,但其在病灶分割等任务中的能力仍存在局限。随着深度学习技术的当代进展,其性能有望得到提升。近期,视觉语言扩散模型崭露头角,在图像生成及下游任务迁移方面展现出卓越性能。本研究旨在挖掘此类模型在全景场景下乳腺病灶分割的潜力,该场景需同时实现语义级与实例级预测。具体而言,我们提出利用Stable Diffusion模型的预训练特征作为前沿全景分割架构的输入,以实现对个体乳腺病灶的精准勾画。为弥合自然图像与医学影像领域的鸿沟,我们在该框架中整合了乳腺影像专用MAM-E扩散模型及BiomedCLIP图文编码器。我们在近期发布的两个乳腺X线数据集CDD-CESM与VinDr-Mammo上评估了所提方法。在实例分割任务中,我们取得了40.25 AP0.1与46.82 AP0.05,以及25.44 PQ0.1与26.92 PQ0.05的性能指标;在语义分割任务中,我们分别获得了38.86与40.92的Dice分数。