Fine-tuning text-to-image diffusion models with human feedback is an effective method for aligning model behavior with human intentions. However, this alignment process often suffers from slow convergence due to the large size and noise present in human feedback datasets. In this work, we propose FiFA, a novel automated data filtering algorithm designed to enhance the fine-tuning of diffusion models using human feedback datasets with direct preference optimization (DPO). Specifically, our approach selects data by solving an optimization problem to maximize three components: preference margin, text quality, and text diversity. The concept of preference margin is used to identify samples that are highly informative in addressing the noisy nature of feedback dataset, which is calculated using a proxy reward model. Additionally, we incorporate text quality, assessed by large language models to prevent harmful contents, and consider text diversity through a k-nearest neighbor entropy estimator to improve generalization. Finally, we integrate all these components into an optimization process, with approximating the solution by assigning importance score to each data pair and selecting the most important ones. As a result, our method efficiently filters data automatically, without the need for manual intervention, and can be applied to any large-scale dataset. Experimental results show that FiFA significantly enhances training stability and achieves better performance, being preferred by humans 17% more, while using less than 0.5% of the full data and thus 1% of the GPU hours compared to utilizing full human feedback datasets.
翻译:利用人类反馈数据微调文本到图像扩散模型,是使模型行为与人类意图对齐的有效方法。然而,由于人类反馈数据集规模庞大且存在噪声,该对齐过程往往收敛缓慢。本文提出FiFA,一种新颖的自动数据过滤算法,旨在通过直接偏好优化(DPO)利用人类反馈数据集增强扩散模型的微调效果。具体而言,我们的方法通过求解一个优化问题来选择数据,以最大化三个组成部分:偏好边际、文本质量和文本多样性。偏好边际的概念用于识别在应对反馈数据集噪声特性方面信息量高的样本,其计算通过一个代理奖励模型实现。此外,我们引入由大型语言模型评估的文本质量以防止有害内容,并通过k近邻熵估计器考虑文本多样性以提升泛化能力。最后,我们将所有这些组成部分整合到一个优化过程中,通过为每个数据对分配重要性分数并选择最重要的数据对来近似求解。因此,我们的方法能够高效地自动过滤数据,无需人工干预,并可应用于任何大规模数据集。实验结果表明,FiFA显著提升了训练稳定性并获得了更优的性能,在人类偏好评估中胜出率高出17%,同时仅使用不到完整数据的0.5%,从而相比使用完整人类反馈数据集,仅需约1%的GPU计算时数。