Diffusion-based text-to-image personalization have achieved great success in generating subjects specified by users among various contexts. Even though, existing finetuning-based methods still suffer from model overfitting, which greatly harms the generative diversity, especially when given subject images are few. To this end, we propose Pick-and-Draw, a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods. Our approach consists of two components: appearance picking guidance and layout drawing guidance. As for the former, we construct an appearance palette with visual features from the reference image, where we pick local patterns for generating the specified subject with consistent identity. As for layout drawing, we outline the subject's contour by referring to a generative template from the vanilla diffusion model, and inherit the strong image prior to synthesize diverse contexts according to different text conditions. The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image. Qualitative and quantitative experiments show that Pick-and-Draw consistently improves identity consistency and generative diversity, pushing the trade-off between subject fidelity and image-text fidelity to a new Pareto frontier.
翻译:基于扩散的文本到图像个性化技术已在根据用户指定对象生成多样化场景方面取得显著成功。然而,现有微调方法仍面临模型过拟合问题,这严重损害了生成多样性,特别是在提供的对象图像数量较少时。为此,我们提出Pick-and-Draw——一种无需训练的语义引导方法,旨在提升个性化方法中的身份一致性与生成多样性。本方法包含两大组件:外观拾取引导与布局绘制引导。前者通过构建参考图像视觉特征的外观调色板,从中拾取局部模式以生成具有一致身份特征的指定对象;后者则借助原始扩散模型中的生成模板勾勒对象轮廓,继承其强大的图像先验,从而根据不同的文本条件合成多样化场景。所提方法可适用于任何个性化扩散模型,且仅需单张参考图像。定性与定量实验表明,Pick-and-Draw持续提升身份一致性与生成多样性,将对象保真度与图文保真度间的权衡推至新的帕累托前沿。