Aligning Diffusion Models with Noise-Conditioned Perception

Recent advancements in human preference optimization, initially developed for Language Models (LMs), have shown promise for text-to-image Diffusion Models, enhancing prompt alignment, visual appeal, and user preference. Unlike LMs, Diffusion Models typically optimize in pixel or VAE space, which does not align well with human perception, leading to slower and less efficient training during the preference alignment stage. We propose using a perceptual objective in the U-Net embedding space of the diffusion model to address these issues. Our approach involves fine-tuning Stable Diffusion 1.5 and XL using Direct Preference Optimization (DPO), Contrastive Preference Optimization (CPO), and supervised fine-tuning (SFT) within this embedding space. This method significantly outperforms standard latent-space implementations across various metrics, including quality and computational cost. For SDXL, our approach provides 60.8\% general preference, 62.2\% visual appeal, and 52.1\% prompt following against original open-sourced SDXL-DPO on the PartiPrompts dataset, while significantly reducing compute. Our approach not only improves the efficiency and quality of human preference alignment for diffusion models but is also easily integrable with other optimization techniques. The training code and LoRA weights will be available here: https://huggingface.co/alexgambashidze/SDXL\_NCP-DPO\_v0.1

翻译：人类偏好优化技术最初为语言模型开发，近期进展表明其在文本到图像扩散模型中具有应用潜力，能够提升提示对齐性、视觉吸引力与用户偏好。与语言模型不同，扩散模型通常在像素空间或VAE空间中优化，这与人类感知机制存在偏差，导致偏好对齐阶段的训练速度缓慢且效率低下。为解决该问题，我们提出在扩散模型的U-Net嵌入空间中使用感知目标函数。本方法通过在该嵌入空间内采用直接偏好优化、对比偏好优化及监督微调技术，对Stable Diffusion 1.5和XL版本进行微调。相较于标准的潜空间实现方案，本方法在多项指标（包括生成质量与计算成本）上均展现出显著优势。在PartiPrompts数据集上，针对SDXL模型，相较于原始开源SDXL-DPO，本方法在综合偏好度、视觉吸引力与提示跟随度方面分别获得60.8%、62.2%与52.1%的改进，同时大幅降低计算开销。该方法不仅提升了扩散模型人类偏好对齐的效率与质量，还能与其他优化技术便捷集成。训练代码与LoRA权重将发布于：https://huggingface.co/alexgambashidze/SDXL\_NCP-DPO\_v0.1

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日