Augmented Reality (AR) applications necessitates methods of inserting needed objects into scenes captured by cameras in a way that is coherent with the surroundings. Common AR applications require the insertion of predefined 3D objects with known properties and shape. This simplifies the problem since it is reduced to extracting an illumination model for the object in that scene by understanding the surrounding light sources. However, it is often not the case that we have information about the properties of an object, especially when we depart from a single source image. Our method renders such source fragments in a coherent way with the target surroundings using only these two images. Our pipeline uses a Deep Image Prior (DIP) network based on a U-Net architecture as the main renderer, alongside robust-feature extracting networks that are used to apply needed losses. Our method does not require any pair-labeled data, and no extensive training on a dataset. We compare our method using qualitative metrics to the baseline methods such as Cut and Paste, Cut And Paste Neural Rendering, and Image Harmonization
翻译:增强现实应用需要将所需物体以与周围环境一致的方式插入相机拍摄的场景中。常见的增强现实应用要求插入具有已知属性和形状的预定义三维物体。这简化了问题,因为只需通过理解周围光源来提取该场景中物体的光照模型。然而,当仅从单一源图像出发时,我们通常无法获知物体的属性信息。我们的方法仅利用这两张图像,就能以与目标环境一致的方式渲染源图像片段。我们的流水线采用基于U-Net架构的深度图像先验网络作为主渲染器,同时结合鲁棒特征提取网络用于施加必要的损失函数。该方法无需任何成对标注数据,也无需对数据集进行大规模训练。我们通过定性指标将本方法与基线方法(如剪切粘贴、剪切粘贴神经渲染和图像协调)进行了对比。