Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

Personalized text-to-image generation methods can generate customized images based on the reference images, which have garnered wide research interest. Recent methods propose a finetuning-free approach with a decoupled cross-attention mechanism to generate personalized images requiring no test-time finetuning. However, when multiple reference images are provided, the current decoupled cross-attention mechanism encounters the object confusion problem and fails to map each reference image to its corresponding object, thereby seriously limiting its scope of application. To address the object confusion problem, in this work we investigate the relevance of different positions of the latent image features to the target object in diffusion model, and accordingly propose a weighted-merge method to merge multiple reference image features into the corresponding objects. Next, we integrate this weighted-merge method into existing pre-trained models and continue to train the model on a multi-object dataset constructed from the open-sourced SA-1B dataset. To mitigate object confusion and reduce training costs, we propose an object quality score to estimate the image quality for the selection of high-quality training samples. Furthermore, our weighted-merge training framework can be employed on single-object generation when a single object has multiple reference images. The experiments verify that our method achieves superior performance to the state-of-the-arts on the Concept101 dataset and DreamBooth dataset of multi-object personalized image generation, and remarkably improves the performance on single-object personalized image generation. Our code is available at https://github.com/hqhQAQ/MIP-Adapter.

翻译：个性化文本到图像生成方法能够基于参考图像生成定制化图像，已引起广泛研究关注。近期方法提出了一种无需微调的解决方案，采用解耦交叉注意力机制来生成个性化图像，无需在测试时进行微调。然而，当提供多个参考图像时，现有的解耦交叉注意力机制会遇到对象混淆问题，无法将每个参考图像映射到其对应的对象，这严重限制了其应用范围。为解决对象混淆问题，本研究深入探究了扩散模型中潜在图像特征的不同位置与目标对象之间的关联性，并据此提出了一种加权融合方法，将多个参考图像特征融合到对应对象中。随后，我们将该加权融合方法集成到现有的预训练模型中，并在基于开源SA-1B数据集构建的多对象数据集上继续训练模型。为缓解对象混淆并降低训练成本，我们提出了对象质量评分来估计图像质量，以筛选高质量训练样本。此外，当单个对象具有多个参考图像时，我们的加权融合训练框架也可应用于单对象生成任务。实验验证表明，在Concept101数据集和DreamBooth数据集的多对象个性化图像生成任务中，我们的方法取得了优于现有技术的性能，并在单对象个性化图像生成任务中显著提升了性能。我们的代码已发布于https://github.com/hqhQAQ/MIP-Adapter。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日