Image outpainting technology generates visually plausible content regardless of authenticity, making it unreliable to be applied in practice. Thus, we propose a reliable image outpainting task, introducing the sparse depth from LiDARs to extrapolate authentic RGB scenes. The large field view of LiDARs allows it to serve for data enhancement and further multimodal tasks. Concretely, we propose a Depth-Guided Outpainting Network to model different feature representations of two modalities and learn the structure-aware cross-modal fusion. And two components are designed: 1) The Multimodal Learning Module produces unique depth and RGB feature representations from the perspectives of different modal characteristics. 2) The Depth Guidance Fusion Module leverages the complete depth modality to guide the establishment of RGB contents by progressive multimodal feature fusion. Furthermore, we specially design an additional constraint strategy consisting of Cross-modal Loss and Edge Loss to enhance ambiguous contours and expedite reliable content generation. Extensive experiments on KITTI and Waymo datasets demonstrate our superiority over the state-of-the-art method, quantitatively and qualitatively.
翻译:图像外扩技术生成的视觉内容虽看似合理但缺乏真实性,导致其在实际应用中的可靠性不足。为此,我们提出一项可靠的图像外扩任务,引入激光雷达采集的稀疏深度信息来外推真实的RGB场景。激光雷达的宽视场特性使其能够服务于数据增强及后续多模态任务。具体而言,我们提出深度引导外扩网络,对两种模态的不同特征表征进行建模,并学习结构感知的跨模态融合。该网络包含两个核心组件:1)多模态学习模块,从不同模态特性视角生成独特的深度与RGB特征表征;2)深度引导融合模块,通过渐进式多模态特征融合,利用完整的深度模态引导RGB内容的构建。此外,我们特别设计了一种由跨模态损失与边缘损失组成的附加约束策略,以增强模糊轮廓并加速可靠内容的生成。在KITTI和Waymo数据集上的大量实验,从定性与定量两个维度均证明了我们方法相较于现有最优技术的优越性。