In surgical computer vision applications, obtaining labeled training data is challenging due to data-privacy concerns and the need for expert annotation. Unpaired image-to-image translation techniques have been explored to automatically generate large annotated datasets by translating synthetic images to the realistic domain. However, preserving the structure and semantic consistency between the input and translated images presents significant challenges, mainly when there is a distributional mismatch in the semantic characteristics of the domains. This study empirically investigates unpaired image translation methods for generating suitable data in surgical applications, explicitly focusing on semantic consistency. We extensively evaluate various state-of-the-art image translation models on two challenging surgical datasets and downstream semantic segmentation tasks. We find that a simple combination of structural-similarity loss and contrastive learning yields the most promising results. Quantitatively, we show that the data generated with this approach yields higher semantic consistency and can be used more effectively as training data.
翻译:在外科计算机视觉应用中,由于数据隐私问题和专家标注需求,获取带标签的训练数据极具挑战性。非配对图像到图像翻译技术通过将合成图像转换至真实域,已被探索用于自动生成大规模标注数据集。然而,当输入图像与翻译后图像所在域的语义特征存在分布失配时,保持两者间的结构与语义一致性面临显著挑战。本研究实证探讨了非配对图像翻译方法在外科手术应用中生成合适数据的能力,特别聚焦于语义一致性。我们全面评估了多种前沿图像翻译模型在两个具有挑战性的外科数据集及下游语义分割任务上的表现。研究发现,结构相似性损失与对比学习组合的简单方法能产生最理想的结果。定量分析表明,通过该方法生成的数据具有更高的语义一致性,并能更有效地用作训练数据。