Layer compositing is one of the most popular image editing workflows among both amateurs and professionals. Motivated by the success of diffusion models, we explore layer compositing from a layered image generation perspective. Instead of generating an image, we propose to generate background, foreground, layer mask, and the composed image simultaneously. To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation. One benefit of the proposed problem is to enable better compositing workflows in addition to the high-quality image output. Another benefit is producing higher-quality layer masks compared to masks produced by a separate step of image segmentation. Experimental results show that the proposed method is able to generate high-quality layered images and initiates a benchmark for future work.
翻译:图层合成是业余与专业用户中最流行的图像编辑工作流之一。受扩散模型成功经验的启发,我们从分层图像生成的角度探索图层合成。不同于生成单一图像,我们提出同时生成背景、前景、图层蒙版及合成图像。为实现分层图像生成,我们训练了一个能重建分层图像的自动编码器,并在潜表示上训练扩散模型。该问题的一个优势在于:除高质量图像输出外,还能提供更优的合成工作流。另一优势是能生成比独立图像分割步骤所得蒙版质量更高的图层蒙版。实验结果表明,所提方法可生成高质量分层图像,并为未来研究建立了基准。