Large-scale diffusion models have achieved remarkable success in generating high-quality images from textual descriptions, gaining popularity across various applications. However, the generation of layered content, such as transparent images with foreground and background layers, remains an under-explored area. Layered content generation is crucial for creative workflows in fields like graphic design, animation, and digital art, where layer-based approaches are fundamental for flexible editing and composition. In this paper, we propose a novel image generation pipeline based on Latent Diffusion Models (LDMs) that generates images with two layers: a foreground layer (RGBA) with transparency information and a background layer (RGB). Unlike existing methods that generate these layers sequentially, our approach introduces a harmonized generation mechanism that enables dynamic interactions between the layers for more coherent outputs. We demonstrate the effectiveness of our method through extensive qualitative and quantitative experiments, showing significant improvements in visual coherence, image quality, and layer consistency compared to baseline methods.
翻译:大规模扩散模型在根据文本描述生成高质量图像方面取得了显著成功,并在各类应用中广受欢迎。然而,对于分层内容(例如包含前景层与背景层的透明图像)的生成,目前仍是一个尚未充分探索的领域。分层内容生成对于平面设计、动画和数字艺术等领域的创意工作流程至关重要,在这些领域中,基于图层的方法是实现灵活编辑与合成的核心。本文提出了一种基于潜在扩散模型的新型图像生成流程,该流程能够生成包含两个图层的图像:一个带有透明度信息的前景层(RGBA)和一个背景层(RGB)。与现有方法顺序生成这些图层不同,我们的方法引入了一种协同生成机制,该机制支持图层间的动态交互,从而产生更协调一致的输出。我们通过广泛的定性与定量实验证明了所提方法的有效性,结果表明,与基线方法相比,我们的方法在视觉连贯性、图像质量和图层一致性方面均有显著提升。