We present LayerDiffuse, an approach enabling large-scale pretrained latent diffusion models to generate transparent images. The method allows generation of single transparent images or of multiple transparent layers. The method learns a "latent transparency" that encodes alpha channel transparency into the latent manifold of a pretrained latent diffusion model. It preserves the production-ready quality of the large diffusion model by regulating the added transparency as a latent offset with minimal changes to the original latent distribution of the pretrained model. In this way, any latent diffusion model can be converted into a transparent image generator by finetuning it with the adjusted latent space. We train the model with 1M transparent image layer pairs collected using a human-in-the-loop collection scheme. We show that latent transparency can be applied to different open source image generators, or be adapted to various conditional control systems to achieve applications like foreground/background-conditioned layer generation, joint layer generation, structural control of layer contents, etc. A user study finds that in most cases (97%) users prefer our natively generated transparent content over previous ad-hoc solutions such as generating and then matting. Users also report the quality of our generated transparent images is comparable to real commercial transparent assets like Adobe Stock.
翻译:我们提出LayerDiffuse方法,使大规模预训练潜在扩散模型能够生成透明图像。该方法可生成单张透明图像或多层透明图层。通过将阿尔法通道透明度编码到预训练潜在扩散模型的潜在流形中,学习"潜在透明度"表征。通过将透明度作为潜在偏移量进行调控,在最小化对原始潜在分布扰动的前提下,保持大规模扩散模型的工业化生成质量。基于此,任何潜在扩散模型均可通过微调调整后的潜在空间转换为透明图像生成器。我们采用人机协同采集方案收集100万组透明图像层对进行模型训练。实验表明,潜在透明度可应用于不同开源图像生成器,或适配多种条件控制系统,实现前景/背景条件约束的图层生成、联合图层生成、图层内容结构控制等应用。用户研究显示,在97%的测试案例中,用户更偏好我们原生生成的透明内容,而非先生成后抠图的临时性解决方案。同时用户反馈我们生成的透明图像质量已可与Adobe Stock等商业透明资产相媲美。