Surgical scene segmentation is essential for enhancing surgical precision, yet it is frequently compromised by the scarcity and imbalance of available data. To address these challenges, semantic image synthesis methods based on generative adversarial networks and diffusion models have been developed. However, these models often yield non-diverse images and fail to capture small, critical tissue classes, limiting their effectiveness. In response, we propose the Class-Aware Semantic Diffusion Model (CASDM), a novel approach which utilizes segmentation maps as conditions for image synthesis to tackle data scarcity and imbalance. Novel class-aware mean squared error and class-aware self-perceptual loss functions have been defined to prioritize critical, less visible classes, thereby enhancing image quality and relevance. Furthermore, to our knowledge, we are the first to generate multi-class segmentation maps using text prompts in a novel fashion to specify their contents. These maps are then used by CASDM to generate surgical scene images, enhancing datasets for training and validating segmentation models. Our evaluation, which assesses both image quality and downstream segmentation performance, demonstrates the strong effectiveness and generalisability of CASDM in producing realistic image-map pairs, significantly advancing surgical scene segmentation across diverse and challenging datasets.
翻译:手术场景分割对于提升手术精度至关重要,但常因可用数据的稀缺与不平衡而受限。为应对这些挑战,基于生成对抗网络和扩散模型的语义图像合成方法已被开发出来。然而,这些模型通常生成多样性不足的图像,且难以捕捉关键的小型组织类别,从而限制了其有效性。为此,我们提出了类别感知语义扩散模型(CASDM),这是一种利用分割图作为图像合成条件以应对数据稀缺与不平衡问题的新方法。我们定义了新颖的类别感知均方误差损失和类别感知自感知损失函数,以优先处理关键且可见度较低的类别,从而提升图像质量与相关性。此外,据我们所知,我们首次以新颖的方式利用文本提示来生成多类别分割图,以指定其内容。这些分割图随后被CASDM用于生成手术场景图像,从而增强用于训练和验证分割模型的数据集。我们的评估同时考量了图像质量和下游分割性能,证明了CASDM在生成逼真的图像-分割图对方面具有强大的有效性和泛化能力,显著推动了跨多样且具挑战性数据集的手术场景分割研究。