重新思考扩散模型中的直接偏好优化 (Rethinking Direct Preference Optimization in Diffusion Models) - 专知论文

会员服务 ·

0

偏好优化 · 参考模型 · 时间步 · GitHub · 扩散模型 ·

2025 年 12 月 24 日

Rethinking Direct Preference Optimization in Diffusion Models

翻译：重新思考扩散模型中的直接偏好优化

Junyong Kang,Seohyun Lim,Kyungjune Baek,Hyunjung Shim

from arxiv, Accepted by SPIGM@NeurIPS 2025 and AAAI-26 (Oral)

Aligning text-to-image (T2I) diffusion models with human preferences has emerged as a critical research challenge. While recent advances in this area have extended preference optimization techniques from large language models (LLMs) to the diffusion setting, they often struggle with limited exploration. In this work, we propose a novel and orthogonal approach to enhancing diffusion-based preference optimization. First, we introduce a stable reference model update strategy that relaxes the frozen reference model, encouraging exploration while maintaining a stable optimization anchor through reference model regularization. Second, we present a timestep-aware training strategy that mitigates the reward scale imbalance problem across timesteps. Our method can be integrated into various preference optimization algorithms. Experimental results show that our approach improves the performance of state-of-the-art methods on human preference evaluation benchmarks. The code is available at the Github: https://github.com/kaist-cvml/RethinkingDPO_Diffusion_Models.

翻译：将文本到图像（T2I）扩散模型与人类偏好对齐已成为一个关键的研究挑战。尽管该领域的最新进展已将偏好优化技术从大语言模型（LLMs）扩展到扩散模型设置，但这些方法常常受限于有限的探索能力。在本工作中，我们提出了一种新颖且正交的方法来增强基于扩散的偏好优化。首先，我们引入了一种稳定的参考模型更新策略，该策略放宽了对冻结参考模型的约束，通过参考模型正则化在鼓励探索的同时保持稳定的优化锚点。其次，我们提出了一种时间步感知的训练策略，以缓解不同时间步之间的奖励尺度不平衡问题。我们的方法可以集成到各种偏好优化算法中。实验结果表明，我们的方法在人类偏好评估基准上提升了最先进方法的性能。代码可在Github获取：https://github.com/kaist-cvml/RethinkingDPO_Diffusion_Models。

0

相关内容

偏好优化

【ICLR2025】基于图形引导的图像场景重建：3D高斯散射方法

【ICLR2025】基于图形引导的图像场景重建：3D高斯散射方法

专知会员服务

13+阅读 · 2025年2月25日

DeepSeek模型综述：V1 V2 V3 R1-Zero

DeepSeek模型综述：V1 V2 V3 R1-Zero

专知会员服务

116+阅读 · 2025年2月11日

[ICML2024]消除偏差：微调基础模型以进行半监督学习

[ICML2024]消除偏差：微调基础模型以进行半监督学习

专知会员服务

18+阅读 · 2024年5月23日

人工智能指导的现实问题非线性优化，Meta AI Yuandong Tian

人工智能指导的现实问题非线性优化，Meta AI Yuandong Tian

专知会员服务

32+阅读 · 2023年3月3日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【ICLR2021】MELR:通过为少样本学习建模情节层次关系的元学习

【ICLR2021】MELR:通过为少样本学习建模情节层次关系的元学习

专知会员服务

15+阅读 · 2021年1月31日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

专知会员服务

11+阅读 · 2020年1月17日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

48+阅读 · 2019年12月13日

【2022新书】分布式机器学习Python实战，284页pdf

【2022新书】分布式机器学习Python实战，284页pdf

专知

14+阅读 · 2022年6月11日

Pytorch多模态框架MMF

Pytorch多模态框架MMF

专知

50+阅读 · 2020年6月20日

【CVPR2020-北京大学】自适应间隔损失的提升小样本学习

【CVPR2020-北京大学】自适应间隔损失的提升小样本学习

专知

12+阅读 · 2020年6月9日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【NeurIPS 2019】vGraph：联合节点检测与节点表示生成模型

【NeurIPS 2019】vGraph：联合节点检测与节点表示生成模型

专知

23+阅读 · 2019年12月21日

AAAI 2019 | 基于分层强化学习的关系抽取

AAAI 2019 | 基于分层强化学习的关系抽取

PaperWeekly

20+阅读 · 2019年3月27日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

使用CNN生成图像先验实现场景的盲图像去模糊

使用CNN生成图像先验实现场景的盲图像去模糊

统计学习与视觉计算组

10+阅读 · 2018年6月14日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

模糊情况下的最优消费与投资

国家自然科学基金

3+阅读 · 2015年12月31日

Stokes/Darcy 耦合问题的数值方法及预处理技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

T-S模糊神经网络的容错同步性分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

Forward-Looking与Backward-Looking相结合的投资组合管理

国家自然科学基金

1+阅读 · 2014年12月31日

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

Steklov特征值问题的自适应非协调有限元方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

一类极大加和逆优化问题的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Latent Adversarial Regularization for Offline Preference Optimization

Arxiv

0+阅读 · 1月29日

KeepLoRA: Continual Learning with Residual Gradient Adaptation

Arxiv

0+阅读 · 1月27日

Provable Differentially Private Computation of the Cross-Attention Mechanism

Arxiv

0+阅读 · 1月23日

Task Aware Dreamer for Task Generalization in Reinforcement Learning

Arxiv

0+阅读 · 1月23日

GeoSurDepth: Harnessing Foundation Model for Spatial Geometry Consistency-Oriented Self-Supervised Surround-View Depth Estimation

Arxiv

0+阅读 · 1月20日

Preconditioning Benefits of Spectral Orthogonalization in Muon

Arxiv

0+阅读 · 1月20日

Provably Safe Reinforcement Learning for Stochastic Reach-Avoid Problems with Entropy Regularization

Arxiv

0+阅读 · 1月15日

OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning

Arxiv

0+阅读 · 1月15日

Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation

Arxiv

0+阅读 · 1月15日

Provably Safe Reinforcement Learning using Entropy Regularizer

Arxiv

0+阅读 · 1月13日

VIP会员

文章信息

相关主题

相关VIP内容

【ICLR2025】基于图形引导的图像场景重建：3D高斯散射方法

【ICLR2025】基于图形引导的图像场景重建：3D高斯散射方法

专知会员服务

13+阅读 · 2025年2月25日

DeepSeek模型综述：V1 V2 V3 R1-Zero

DeepSeek模型综述：V1 V2 V3 R1-Zero

专知会员服务

116+阅读 · 2025年2月11日

[ICML2024]消除偏差：微调基础模型以进行半监督学习

[ICML2024]消除偏差：微调基础模型以进行半监督学习

专知会员服务

18+阅读 · 2024年5月23日

人工智能指导的现实问题非线性优化，Meta AI Yuandong Tian

人工智能指导的现实问题非线性优化，Meta AI Yuandong Tian

专知会员服务

32+阅读 · 2023年3月3日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【ICLR2021】MELR:通过为少样本学习建模情节层次关系的元学习

【ICLR2021】MELR:通过为少样本学习建模情节层次关系的元学习

专知会员服务

15+阅读 · 2021年1月31日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

专知会员服务

11+阅读 · 2020年1月17日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

48+阅读 · 2019年12月13日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】基于自适应表征的高效视觉建模

《多域作战中融合网络、电子战与动能机动》

AI智能体时代大模型安全风险与攻防新挑战

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

相关资讯

【2022新书】分布式机器学习Python实战，284页pdf

【2022新书】分布式机器学习Python实战，284页pdf

专知

14+阅读 · 2022年6月11日

Pytorch多模态框架MMF

Pytorch多模态框架MMF

专知

50+阅读 · 2020年6月20日

【CVPR2020-北京大学】自适应间隔损失的提升小样本学习

【CVPR2020-北京大学】自适应间隔损失的提升小样本学习

专知

12+阅读 · 2020年6月9日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【NeurIPS 2019】vGraph：联合节点检测与节点表示生成模型

【NeurIPS 2019】vGraph：联合节点检测与节点表示生成模型

专知

23+阅读 · 2019年12月21日

AAAI 2019 | 基于分层强化学习的关系抽取

AAAI 2019 | 基于分层强化学习的关系抽取

PaperWeekly

20+阅读 · 2019年3月27日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

使用CNN生成图像先验实现场景的盲图像去模糊

使用CNN生成图像先验实现场景的盲图像去模糊

统计学习与视觉计算组

10+阅读 · 2018年6月14日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

相关论文

Latent Adversarial Regularization for Offline Preference Optimization

Arxiv

0+阅读 · 1月29日

KeepLoRA: Continual Learning with Residual Gradient Adaptation

Arxiv

0+阅读 · 1月27日

Provable Differentially Private Computation of the Cross-Attention Mechanism

Arxiv

0+阅读 · 1月23日

Task Aware Dreamer for Task Generalization in Reinforcement Learning

Arxiv

0+阅读 · 1月23日

GeoSurDepth: Harnessing Foundation Model for Spatial Geometry Consistency-Oriented Self-Supervised Surround-View Depth Estimation

Arxiv

0+阅读 · 1月20日

Preconditioning Benefits of Spectral Orthogonalization in Muon

Arxiv

0+阅读 · 1月20日

Provably Safe Reinforcement Learning for Stochastic Reach-Avoid Problems with Entropy Regularization

Arxiv

0+阅读 · 1月15日

OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning

Arxiv

0+阅读 · 1月15日

Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation

Arxiv

0+阅读 · 1月15日

Provably Safe Reinforcement Learning using Entropy Regularizer

Arxiv

0+阅读 · 1月13日

相关基金

模糊情况下的最优消费与投资

国家自然科学基金

3+阅读 · 2015年12月31日

Stokes/Darcy 耦合问题的数值方法及预处理技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

T-S模糊神经网络的容错同步性分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

Forward-Looking与Backward-Looking相结合的投资组合管理

国家自然科学基金

1+阅读 · 2014年12月31日

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

Steklov特征值问题的自适应非协调有限元方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

一类极大加和逆优化问题的研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员