Photorealistic image generation from simulated label maps are necessitated in several contexts, such as for medical training in virtual reality. With conventional deep learning methods, this task requires images that are paired with semantic annotations, which typically are unavailable. We introduce a contrastive learning framework for generating photorealistic images from simulated label maps, by learning from unpaired sets of both. Due to potentially large scene differences between real images and label maps, existing unpaired image translation methods lead to artifacts of scene modification in synthesized images. We utilize simulated images as surrogate targets for a contrastive loss, while ensuring consistency by utilizing features from a reverse translation network. Our method enables bidirectional label-image translations, which is demonstrated in a variety of scenarios and datasets, including laparoscopy, ultrasound, and driving scenes. By comparing with state-of-the-art unpaired translation methods, our proposed method is shown to generate realistic and scene-accurate translations.
翻译:在虚拟现实医学训练等多个场景中,需要从仿真标签图生成逼真的图像。传统深度学习方法要求图像与语义标注成对存在,而这类数据通常难以获取。我们提出一种对比学习框架,通过非配对的标签图与真实图像学习生成逼真图像。由于真实图像与标签图之间可能存在较大的场景差异,现有非配对图像翻译方法会导致合成图像中出现场景修改伪影。我们利用仿真图像作为对比损失的代理目标,同时通过反向翻译网络的特征确保一致性。本方法支持双向标签-图像翻译,在腹腔镜、超声和驾驶场景等多种数据集上得到验证。与最先进的非配对翻译方法相比,本方法生成的翻译结果既真实又保持场景准确性。