Image blending aims to combine multiple images seamlessly. It remains challenging for existing 2D-based methods, especially when input images are misaligned due to differences in 3D camera poses and object shapes. To tackle these issues, we propose a 3D-aware blending method using generative Neural Radiance Fields (NeRF), including two key components: 3D-aware alignment and 3D-aware blending. For 3D-aware alignment, we first estimate the camera pose of the reference image with respect to generative NeRFs and then perform 3D local alignment for each part. To further leverage 3D information of the generative NeRF, we propose 3D-aware blending that directly blends images on the NeRF's latent representation space, rather than raw pixel space. Collectively, our method outperforms existing 2D baselines, as validated by extensive quantitative and qualitative evaluations with FFHQ and AFHQ-Cat.
翻译:图像融合旨在实现多张图像的无缝合成。现有基于二维的方法仍面临诸多挑战,尤其在输入图像因三维相机位姿和物体形状差异导致未对齐时。为解决这些问题,我们提出一种基于生成式神经辐射场(NeRF)的三维感知融合方法,包含两个关键模块:三维感知对齐与三维感知融合。在三维感知对齐中,我们首先估计参考图像相对于生成式NeRF的相机位姿,随后对各局部区域进行三维局部对齐。为充分利用生成式NeRF的三维信息,我们提出直接在NeRF潜在表示空间而非原始像素空间进行图像融合的三维感知融合方法。通过FFHQ和AFHQ-Cat数据集上的大量定量与定性评估,验证了本方法在性能上超越现有二维基线模型。