In surgical computer vision applications, obtaining labeled training data is challenging due to data-privacy concerns and the need for expert annotation. Unpaired image-to-image translation techniques have been explored to automatically generate large annotated datasets by translating synthetic images to the realistic domain. However, preserving the structure and semantic consistency between the input and translated images presents significant challenges, mainly when there is a distributional mismatch in the semantic characteristics of the domains. This study empirically investigates unpaired image translation methods for generating suitable data in surgical applications, explicitly focusing on semantic consistency. We extensively evaluate various state-of-the-art image translation models on two challenging surgical datasets and downstream semantic segmentation tasks. We find that a simple combination of structural-similarity loss and contrastive learning yields the most promising results. Quantitatively, we show that the data generated with this approach yields higher semantic consistency and can be used more effectively as training data.
翻译:在手术计算机视觉应用中,由于数据隐私问题及专家标注需求,获取带标签的训练数据极具挑战性。非配对图像转换技术通过将合成图像转换为逼真域,为自动生成大规模标注数据集提供了可能。然而,在输入图像与转换图像之间保持结构一致性和语义一致性仍面临重大挑战——尤其当图像域的语义特征存在分布失配时。本研究系统性探索了面向手术应用数据生成的非配对图像转换方法,重点关注语义一致性。我们针对两个具有挑战性的手术数据集及下游语义分割任务,全面评估了多种最先进的图像转换模型。结果表明,结构相似性损失与对比学习的简单组合可取得最优效果。定量分析显示,该方法生成的数据具有更高的语义一致性,并能更有效地用作训练数据。