We present LayerDiffusion, an approach enabling large-scale pretrained latent diffusion models to generate transparent images. The method allows generation of single transparent images or of multiple transparent layers. The method learns a "latent transparency" that encodes alpha channel transparency into the latent manifold of a pretrained latent diffusion model. It preserves the production-ready quality of the large diffusion model by regulating the added transparency as a latent offset with minimal changes to the original latent distribution of the pretrained model. In this way, any latent diffusion model can be converted into a transparent image generator by finetuning it with the adjusted latent space. We train the model with 1M transparent image layer pairs collected using a human-in-the-loop collection scheme. We show that latent transparency can be applied to different open source image generators, or be adapted to various conditional control systems to achieve applications like foreground/background-conditioned layer generation, joint layer generation, structural control of layer contents, etc. A user study finds that in most cases (97%) users prefer our natively generated transparent content over previous ad-hoc solutions such as generating and then matting. Users also report the quality of our generated transparent images is comparable to real commercial transparent assets like Adobe Stock.
翻译:我们提出了LayerDiffusion,一种使大规模预训练潜在扩散模型能够生成透明图像的方法。该方法支持生成单个透明图像或多个透明层。它学习了一种“潜在透明度”,将Alpha通道透明度编码到预训练潜在扩散模型的潜在空间中。通过将添加的透明度作为潜在偏移量进行调节,并最小化对预训练模型原始潜在分布的改动,该方法保留了大规模扩散模型的生产级质量。这样,任何潜在扩散模型都可以通过微调调整后的潜在空间转换为透明图像生成器。我们利用人工参与的收集方案,收集了100万对透明图像层数据来训练模型。研究表明,潜在透明度可应用于不同的开源图像生成器,或适配各种条件控制系统,实现诸如前景/背景条件层生成、联合层生成、层内容结构控制等应用。用户研究发现,在大多数情况下(97%),用户更倾向于我们原生生成的透明内容,而非以往如生成后抠图等临时解决方案。用户还报告称,我们生成的透明图像质量可与Adobe Stock等真实商业透明资产相媲美。