Text-to-image diffusion models have attracted considerable interest due to their wide applicability across diverse fields. However, challenges persist in creating controllable models for personalized object generation. In this paper, we first identify the entanglement issues in existing personalized generative models, and then propose a straightforward and efficient data augmentation training strategy that guides the diffusion model to focus solely on object identity. By inserting the plug-and-play adapter layers from a pre-trained controllable diffusion model, our model obtains the ability to control the location and size of each generated personalized object. During inference, we propose a regionally-guided sampling technique to maintain the quality and fidelity of the generated images. Our method achieves comparable or superior fidelity for personalized objects, yielding a robust, versatile, and controllable text-to-image diffusion model that is capable of generating realistic and personalized images. Our approach demonstrates significant potential for various applications, such as those in art, entertainment, and advertising design.
翻译:文本到图像扩散模型因其在多个领域的广泛适用性而备受关注。然而,在构建可控的个性化物体生成模型方面仍面临挑战。本文首先指出现有个性化生成模型中的纠缠问题,进而提出一种简洁高效的数据增强训练策略,引导扩散模型仅关注物体身份特征。通过插入预训练可控扩散模型的即插即用适配层,我们的模型获得了控制每个生成个性化物体位置与尺寸的能力。在推理阶段,我们提出区域引导采样技术以维持生成图像的质量与保真度。本方法在个性化物体生成中达到相当或更优的保真度,构建了一个鲁棒、通用且可控的文本到图像扩散模型,能够生成逼真且个性化的图像。我们的方法在艺术、娱乐和广告设计等多种应用场景中展现出显著潜力。