Image composition is a complex task which requires a lot of information about the scene for an accurate and realistic composition, such as perspective, lighting, shadows, occlusions, and object interactions. Previous methods have predominantly used 2D information for image composition, neglecting the potentials of 3D spatial information. In this work, we propose DepGAN, a Generative Adversarial Network that utilizes depth maps and alpha channels to rectify inaccurate occlusions and enhance transparency effects in image composition. Central to our network is a novel loss function called Depth Aware Loss which quantifies the pixel wise depth difference to accurately delineate occlusion boundaries while compositing objects at different depth levels. Furthermore, we enhance our network's learning process by utilizing opacity data, enabling it to effectively manage compositions involving transparent and semi-transparent objects. We tested our model against state-of-the-art image composition GANs on benchmark (both real and synthetic) datasets. The results reveal that DepGAN significantly outperforms existing methods in terms of accuracy of object placement semantics, transparency and occlusion handling, both visually and quantitatively. Our code is available at https://amrtsg.github.io/DepGAN/.
翻译:图像合成是一项复杂任务,需要大量场景信息(如透视、光照、阴影、遮挡和物体交互)才能实现精确且逼真的合成效果。现有方法主要依赖二维信息进行图像合成,忽视了三维空间信息的潜力。本研究提出DepGAN——一种利用深度图和Alpha通道的生成对抗网络,用于校正图像合成中的不准确遮挡并增强透明效果。该网络的核心是一种称为深度感知损失的新型损失函数,该函数通过量化像素级深度差异,在不同深度层级合成物体时精确划定遮挡边界。此外,我们通过利用不透明度数据增强网络的学习过程,使其能有效处理包含透明与半透明物体的合成任务。我们在基准数据集(包含真实与合成数据)上对比测试了本模型与当前最先进的图像合成生成对抗网络。结果表明,无论在视觉呈现还是量化指标上,DepGAN在物体放置语义准确性、透明度与遮挡处理方面均显著优于现有方法。代码已开源:https://amrtsg.github.io/DepGAN/。