In computer-assisted surgery, automatically recognizing anatomical organs is crucial for understanding the surgical scene and providing intraoperative assistance. While machine learning models can identify such structures, their deployment is hindered by the need for labeled, diverse surgical datasets with anatomical annotations. Labeling multiple classes (i.e., organs) in a surgical scene is time-intensive, requiring medical experts. Although synthetically generated images can enhance segmentation performance, maintaining both organ structure and texture during generation is challenging. We introduce a multi-stage approach using diffusion models to generate multi-class surgical datasets with annotations. Our framework improves anatomy awareness by training organ specific models with an inpainting objective guided by binary segmentation masks. The organs are generated with an inference pipeline using pre-trained ControlNet to maintain the organ structure. The synthetic multi-class datasets are constructed through an image composition step, ensuring structural and textural consistency. This versatile approach allows the generation of multi-class datasets from real binary datasets and simulated surgical masks. We thoroughly evaluate the generated datasets on image quality and downstream segmentation, achieving a $15\%$ improvement in segmentation scores when combined with real images. The code is available at https://gitlab.com/nct_tso_public/muli-class-image-synthesis
翻译:在计算机辅助手术中,自动识别解剖器官对于理解手术场景和提供术中辅助至关重要。虽然机器学习模型能够识别此类结构,但其部署受到需要具有解剖标注的、多样化的标记手术数据集的限制。在手术场景中标记多个类别(即器官)是耗时的工作,需要医学专家参与。尽管合成生成的图像可以提升分割性能,但在生成过程中同时保持器官结构和纹理具有挑战性。我们提出了一种利用扩散模型生成带标注的多类别手术数据集的多阶段方法。我们的框架通过训练器官特定模型来提升解剖感知能力,该模型以二值分割掩码为指导,以修复为目标。器官的生成采用基于预训练ControlNet的推理流程,以保持器官结构。合成多类别数据集通过图像合成步骤构建,确保了结构和纹理的一致性。这种通用方法允许从真实二值数据集和模拟手术掩码生成多类别数据集。我们全面评估了生成数据集在图像质量和下游分割任务上的表现,当与真实图像结合使用时,分割分数提升了$15\%$。代码可在 https://gitlab.com/nct_tso_public/muli-class-image-synthesis 获取。