This work presents a new depth- and semantics-aware conditional generative model, named TITAN-Next, for cross-domain image-to-image translation in a multi-modal setup between LiDAR and camera sensors. The proposed model leverages scene semantics as a mid-level representation and is able to translate raw LiDAR point clouds to RGB-D camera images by solely relying on semantic scene segments. We claim that this is the first framework of its kind and it has practical applications in autonomous vehicles such as providing a fail-safe mechanism and augmenting available data in the target image domain. The proposed model is evaluated on the large-scale and challenging Semantic-KITTI dataset, and experimental findings show that it considerably outperforms the original TITAN-Net and other strong baselines by 23.7$\%$ margin in terms of IoU.
翻译:本工作提出了一种名为TITAN-Net的新型深度与语义感知条件生成模型,专用于激光雷达与相机传感器间多模态设置下的跨域图像到图像转换。该模型利用场景语义作为中间层表征,能够仅依赖语义场景分割结果将原始激光雷达点云转换为RGB-D相机图像。我们声称这是首个同类框架,在自动驾驶领域具有实际应用价值,例如提供故障安全机制以及增强目标图像域可用的数据。该模型在规模庞大且具有挑战性的Semantic-KITTI数据集上进行了评估,实验结果表明,其在IoU指标上相比原始TITAN-Net及其他强基线方法取得了23.7%的显著性能提升。