Variational Distillation of Diffusion Policies into Mixture of Experts

This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to replicate the inherent diversity in human behavior, making them the preferred models in behavior learning such as Learning from Human Demonstrations (LfD). However, diffusion models come with some drawbacks, including the intractability of likelihoods and long inference times due to their iterative sampling process. The inference times, in particular, pose a significant challenge to real-time applications such as robot control. In contrast, MoEs effectively address the aforementioned issues while retaining the ability to represent complex distributions but are notoriously difficult to train. VDD is the first method that distills pre-trained diffusion models into MoE models, and hence, combines the expressiveness of Diffusion Models with the benefits of Mixture Models. Specifically, VDD leverages a decompositional upper bound of the variational objective that allows the training of each expert separately, resulting in a robust optimization scheme for MoEs. VDD demonstrates across nine complex behavior learning tasks, that it is able to: i) accurately distill complex distributions learned by the diffusion model, ii) outperform existing state-of-the-art distillation methods, and iii) surpass conventional methods for training MoE.

翻译：本文提出了变分扩散蒸馏（VDD），一种通过变分推断将去噪扩散策略蒸馏为专家混合模型（MoE）的新方法。扩散模型因其精确学习和表示复杂多模态分布的卓越能力，已成为当前生成建模领域的先进技术。这种能力使扩散模型能够复现人类行为固有的多样性，使其成为行为学习（如从人类演示中学习）的首选模型。然而，扩散模型存在一些缺陷，包括似然计算的难处理性以及迭代采样过程导致的较长推理时间。推理时间问题尤其对机器人控制等实时应用构成重大挑战。相比之下，MoE在保持表示复杂分布能力的同时，有效解决了上述问题，但其训练 notoriously 困难。VDD是首个将预训练扩散模型蒸馏为MoE模型的方法，从而将扩散模型的表达能力与混合模型的优势相结合。具体而言，VDD利用变分目标的分解上界，允许分别训练每个专家，从而为MoE提供了鲁棒的优化方案。在九个复杂行为学习任务上的实验表明，VDD能够：i）精确蒸馏扩散模型学习的复杂分布；ii）超越现有先进的蒸馏方法；iii）优于传统的MoE训练方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日