扩散对偶性 (The Diffusion Duality) - 专知论文

会员服务 ·

0

离散 · 对偶性 · 一致 · 蒸馏 · 课程学习 ·

2025 年 12 月 19 日

The Diffusion Duality

翻译：扩散对偶性

Subham Sekhar Sahoo,Justin Deschenaux,Aaron Gokaslan,Guanghan Wang,Justin Chiu,Volodymyr Kuleshov

from arxiv, ICML 2025. We provide the code at: https://github.com/s-sahoo/duo [v3] includes improved theory, clearer presentation, and a new future work section

Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code, model checkpoints, and video tutorials on the project page: http://s-sahoo.github.io/duo

翻译：均匀态离散扩散模型因其固有的自校正能力，在快速文本生成方面展现出潜力。然而，其性能通常落后于自回归模型与掩码扩散模型。本研究通过揭示一个关键机制来缩小这一性能差距：均匀态扩散过程本质上源于底层的高斯扩散。我们提出的Duo方法，将高斯扩散中的强大技术迁移至离散领域，从而同时改进训练与采样过程。首先，我们引入一种由高斯过程引导的课程学习策略，通过降低方差使训练速度提升一倍。采用课程学习训练的模型在7项基准测试中的3项上，其零样本困惑度超越了自回归模型。其次，我们提出了离散一致性蒸馏算法，将连续域的一致性蒸馏技术适配至离散场景。该算法通过将采样速度提升两个数量级，实现了扩散语言模型中的少步生成。相关代码、模型检查点及视频教程已发布于项目页面：http://s-sahoo.github.io/duo

0

相关内容

【CVPR2024】医学基础模型的低秩知识分解

【CVPR2024】医学基础模型的低秩知识分解

专知会员服务

35+阅读 · 2024年4月29日

【AAAI2023】图上的非独立同分布迁移学习

【AAAI2023】图上的非独立同分布迁移学习

专知会员服务

24+阅读 · 2022年12月25日

【ICML2021】GeomCA: 数据表示几何评估

专知会员服务

15+阅读 · 2021年9月11日

【ICML2021】具有性能保证的弱监督下的对抗性多类学习

专知会员服务

17+阅读 · 2021年7月13日

【ICML2021】基于低秩重参数化的大规模私有学习

专知会员服务

12+阅读 · 2021年6月20日

【ICML2021】数据表示的几何评估

专知会员服务

38+阅读 · 2021年6月3日

【WWW2021】场矩阵分解机推荐系统

【WWW2021】场矩阵分解机推荐系统

专知会员服务

33+阅读 · 2021年2月27日

【NeurIPS2020】无限可能的联合对比学习

专知会员服务

29+阅读 · 2020年10月2日

【NeurIPS2020】可处理的反事实推理的深度结构因果模型

【NeurIPS2020】可处理的反事实推理的深度结构因果模型

专知会员服务

49+阅读 · 2020年9月28日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知会员服务

112+阅读 · 2019年11月25日

【CVPR2023】探索和利用不确定性的不完整多视角分类

【CVPR2023】探索和利用不确定性的不完整多视角分类

专知

42+阅读 · 2023年4月13日

【ICML2021】因果匹配领域泛化

【ICML2021】因果匹配领域泛化

专知

12+阅读 · 2021年8月12日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

专知

31+阅读 · 2020年4月4日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

LibRec 每周算法：DeepFM

LibRec 每周算法：DeepFM

LibRec智能推荐

14+阅读 · 2017年11月6日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

有理 Krylov 子空间算法的最优参数选取

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

随机系数和带跳的线性随机微分系统的H2/H∞控制

国家自然科学基金

0+阅读 · 2014年12月31日

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

网络的小世界结构及其上随机游动的混合时

国家自然科学基金

1+阅读 · 2014年12月31日

基于狄利克雷过程的潜变量模型贝叶斯半参数分析

国家自然科学基金

2+阅读 · 2014年12月31日

Gradient Regularized Natural Gradients

Arxiv

0+阅读 · 1月26日

Resonant Sparse Geometry Networks

Arxiv

0+阅读 · 1月26日

On Fine-Grained I/O Complexity of Attention Backward Passes

Arxiv

0+阅读 · 1月23日

Non-Stationary Functional Bilevel Optimization

Arxiv

0+阅读 · 1月21日

Certified Real Eigenvalue Location

Arxiv

0+阅读 · 1月20日

Adaptive Entanglement Distillation

Arxiv

0+阅读 · 1月19日

Quantum Interactive Oracle Proofs

Arxiv

0+阅读 · 1月19日

Identifying Conditions Favouring Multiplicative Heterogeneity Models in Network Meta-Analysis

Arxiv

0+阅读 · 1月16日

Discrete Feynman-Kac Correctors

Arxiv

0+阅读 · 1月15日

Input Convex Kolmogorov Arnold Networks

Arxiv

0+阅读 · 1月14日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2024】医学基础模型的低秩知识分解

【CVPR2024】医学基础模型的低秩知识分解

专知会员服务

35+阅读 · 2024年4月29日

【AAAI2023】图上的非独立同分布迁移学习

【AAAI2023】图上的非独立同分布迁移学习

专知会员服务

24+阅读 · 2022年12月25日

【ICML2021】GeomCA: 数据表示几何评估

专知会员服务

15+阅读 · 2021年9月11日

【ICML2021】具有性能保证的弱监督下的对抗性多类学习

专知会员服务

17+阅读 · 2021年7月13日

【ICML2021】基于低秩重参数化的大规模私有学习

专知会员服务

12+阅读 · 2021年6月20日

【ICML2021】数据表示的几何评估

专知会员服务

38+阅读 · 2021年6月3日

【WWW2021】场矩阵分解机推荐系统

【WWW2021】场矩阵分解机推荐系统

专知会员服务

33+阅读 · 2021年2月27日

【NeurIPS2020】无限可能的联合对比学习

专知会员服务

29+阅读 · 2020年10月2日

【NeurIPS2020】可处理的反事实推理的深度结构因果模型

【NeurIPS2020】可处理的反事实推理的深度结构因果模型

专知会员服务

49+阅读 · 2020年9月28日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知会员服务

112+阅读 · 2019年11月25日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】基于自适应表征的高效视觉建模

《多域作战中融合网络、电子战与动能机动》

AI智能体时代大模型安全风险与攻防新挑战

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

相关资讯

【CVPR2023】探索和利用不确定性的不完整多视角分类

【CVPR2023】探索和利用不确定性的不完整多视角分类

专知

42+阅读 · 2023年4月13日

【ICML2021】因果匹配领域泛化

【ICML2021】因果匹配领域泛化

专知

12+阅读 · 2021年8月12日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

专知

31+阅读 · 2020年4月4日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

LibRec 每周算法：DeepFM

LibRec 每周算法：DeepFM

LibRec智能推荐

14+阅读 · 2017年11月6日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

相关论文

Gradient Regularized Natural Gradients

Arxiv

0+阅读 · 1月26日

Resonant Sparse Geometry Networks

Arxiv

0+阅读 · 1月26日

On Fine-Grained I/O Complexity of Attention Backward Passes

Arxiv

0+阅读 · 1月23日

Non-Stationary Functional Bilevel Optimization

Arxiv

0+阅读 · 1月21日

Certified Real Eigenvalue Location

Arxiv

0+阅读 · 1月20日

Adaptive Entanglement Distillation

Arxiv

0+阅读 · 1月19日

Quantum Interactive Oracle Proofs

Arxiv

0+阅读 · 1月19日

Identifying Conditions Favouring Multiplicative Heterogeneity Models in Network Meta-Analysis

Arxiv

0+阅读 · 1月16日

Discrete Feynman-Kac Correctors

Arxiv

0+阅读 · 1月15日

Input Convex Kolmogorov Arnold Networks

Arxiv

0+阅读 · 1月14日

相关基金

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

有理 Krylov 子空间算法的最优参数选取

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

随机系数和带跳的线性随机微分系统的H2/H∞控制

国家自然科学基金

0+阅读 · 2014年12月31日

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

网络的小世界结构及其上随机游动的混合时

国家自然科学基金

1+阅读 · 2014年12月31日

基于狄利克雷过程的潜变量模型贝叶斯半参数分析

国家自然科学基金

2+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员