In surgical computer vision applications, obtaining labeled training data is challenging due to data-privacy concerns and the need for expert annotation. Unpaired image-to-image translation techniques have been explored to automatically generate large annotated datasets by translating synthetic images to the realistic domain. However, preserving the structure and semantic consistency between the input and translated images presents significant challenges, mainly when there is a distributional mismatch in the semantic characteristics of the domains. This study empirically investigates unpaired image translation methods for generating suitable data in surgical applications, explicitly focusing on semantic consistency. We extensively evaluate various state-of-the-art image translation models on two challenging surgical datasets and downstream semantic segmentation tasks. We find that a simple combination of structural-similarity loss and contrastive learning yields the most promising results. Quantitatively, we show that the data generated with this approach yields higher semantic consistency and can be used more effectively as training data.The code is available at https://gitlab.com/nct_tso_public/constructs.
翻译:在外科计算机视觉应用中,由于数据隐私问题和需要专家标注,获取有标签的训练数据具有挑战性。非配对图像到图像翻译技术已被探索用于通过将合成图像翻译到真实域来自动生成大规模标注数据集。然而,保持输入图像与翻译图像之间的结构和语义一致性仍面临重大挑战,尤其是在域的语义特征存在分布不匹配时。本研究实证调查了非配对图像翻译方法,以生成适用于外科手术应用的数据,重点关注语义一致性。我们在两个具有挑战性的外科手术数据集及下游语义分割任务上广泛评估了多种最先进的图像翻译模型。研究发现,结构相似性损失与对比学习的简单组合能产生最有前景的结果。定量分析表明,使用该方法生成的数据具有更高的语义一致性,并能更有效地用作训练数据。代码可在 https://gitlab.com/nct_tso_public/constructs 获取。