Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

Personalized text-to-image generation methods can generate customized images based on the reference images, which have garnered wide research interest. Recent methods propose a finetuning-free approach with a decoupled cross-attention mechanism to generate personalized images requiring no test-time finetuning. However, when multiple reference images are provided, the current decoupled cross-attention mechanism encounters the object confusion problem and fails to map each reference image to its corresponding object, thereby seriously limiting its scope of application. To address the object confusion problem, in this work we investigate the relevance of different positions of the latent image features to the target object in diffusion model, and accordingly propose a weighted-merge method to merge multiple reference image features into the corresponding objects. Next, we integrate this weighted-merge method into existing pre-trained models and continue to train the model on a multi-object dataset constructed from the open-sourced SA-1B dataset. To mitigate object confusion and reduce training costs, we propose an object quality score to estimate the image quality for the selection of high-quality training samples. Furthermore, our weighted-merge training framework can be employed on single-object generation when a single object has multiple reference images. The experiments verify that our method achieves superior performance to the state-of-the-arts on the Concept101 dataset and DreamBooth dataset of multi-object personalized image generation, and remarkably improves the performance on single-object personalized image generation. Our code is available at https://github.com/hqhQAQ/MIP-Adapter.

翻译：个性化文本到图像生成方法能够基于参考图像生成定制化图像，已引起广泛研究关注。近期方法提出采用解耦交叉注意力机制的免微调方案，无需测试时微调即可生成个性化图像。然而，当提供多张参考图像时，现有解耦交叉注意力机制会遭遇对象混淆问题，无法将每张参考图像映射至其对应对象，这严重限制了其应用范围。为解决对象混淆问题，本研究深入探究扩散模型中潜在图像特征的不同位置与目标对象的相关性，据此提出加权融合方法将多张参考图像特征融合至对应对象。随后，我们将此加权融合方法集成至现有预训练模型，并基于开源SA-1B数据集构建的多对象数据集继续训练模型。为缓解对象混淆并降低训练成本，我们提出对象质量评分机制以评估图像质量，从而筛选高质量训练样本。此外，当单个对象具有多张参考图像时，我们的加权融合训练框架也可应用于单对象生成场景。实验验证表明，本方法在Concept101数据集和DreamBooth数据集的多对象个性化图像生成任务中均优于当前最优方法，并在单对象个性化图像生成任务上取得显著性能提升。代码已开源：https://github.com/hqhQAQ/MIP-Adapter。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日