We present LayerDiffusion, an approach enabling large-scale pretrained latent diffusion models to generate transparent images. The method allows generation of single transparent images or of multiple transparent layers. The method learns a "latent transparency" that encodes alpha channel transparency into the latent manifold of a pretrained latent diffusion model. It preserves the production-ready quality of the large diffusion model by regulating the added transparency as a latent offset with minimal changes to the original latent distribution of the pretrained model. In this way, any latent diffusion model can be converted into a transparent image generator by finetuning it with the adjusted latent space. We train the model with 1M transparent image layer pairs collected using a human-in-the-loop collection scheme. We show that latent transparency can be applied to different open source image generators, or be adapted to various conditional control systems to achieve applications like foreground/background-conditioned layer generation, joint layer generation, structural control of layer contents, etc. A user study finds that in most cases (97%) users prefer our natively generated transparent content over previous ad-hoc solutions such as generating and then matting. Users also report the quality of our generated transparent images is comparable to real commercial transparent assets like Adobe Stock.
翻译:我们提出了LayerDiffusion方法,使大规模预训练潜在扩散模型能够生成透明图像。该方法支持生成单张透明图像或多层透明图层。其核心创新在于学习一种"潜在透明度"表征,将Alpha通道透明度编码到预训练潜在扩散模型的潜在流形中。通过将新增透明度作为潜在偏移量进行调控,在最小化改变预训练模型原始潜在分布的同时,保留了大型扩散模型的生产级质量。这使得任何潜在扩散模型都可以通过微调调整后的潜在空间,转化为透明图像生成器。我们采用人工参与的数据收集方案,采集了100万组透明图像图层对进行模型训练。实验表明,潜在透明度可应用于不同开源图像生成器,或适配多种条件控制系统,实现前景/背景条件图层生成、联合图层生成、图层内容结构控制等应用。用户研究显示,在97%的案例中,用户更偏好我们原生生成的透明内容,而非先生成后抠图的临时方案。同时用户反馈,我们生成的透明图像质量已可比肩Adobe Stock等商业透明资产。