Diffusion Model-Augmented Behavioral Cloning

Imitation learning addresses the challenge of learning by observing an expert's demonstrations without access to reward signals from environments. Most existing imitation learning methods that do not require interacting with environments either model the expert distribution as the conditional probability p(a|s) (e.g., behavioral cloning, BC) or the joint probability p(s, a). Despite the simplicity of modeling the conditional probability with BC, it usually struggles with generalization. While modeling the joint probability can improve generalization performance, the inference procedure is often time-consuming, and the model can suffer from manifold overfitting. This work proposes an imitation learning framework that benefits from modeling both the conditional and joint probability of the expert distribution. Our proposed Diffusion Model-Augmented Behavioral Cloning (DBC) employs a diffusion model trained to model expert behaviors and learns a policy to optimize both the BC loss (conditional) and our proposed diffusion model loss (joint). DBC outperforms baselines in various continuous control tasks in navigation, robot arm manipulation, dexterous manipulation, and locomotion. We design additional experiments to verify the limitations of modeling either the conditional probability or the joint probability of the expert distribution, as well as compare different generative models. Ablation studies justify the effectiveness of our design choices.

翻译：模仿学习旨在通过观察专家演示来学习，而无需访问环境中的奖励信号。大多数无需与环境交互的现有模仿学习方法要么将专家分布建模为条件概率 p(a|s)（例如行为克隆，BC），要么建模为联合概率 p(s, a)。尽管使用 BC 建模条件概率较为简单，但其通常难以实现良好的泛化。虽然建模联合概率可以提升泛化性能，但其推断过程往往耗时，且模型可能遭受流形过拟合。本文提出了一种模仿学习框架，该框架受益于同时对专家分布的条件概率和联合概率进行建模。我们提出的扩散模型增强的行为克隆（DBC）采用一个经过训练的扩散模型来建模专家行为，并学习一个策略以同时优化 BC 损失（条件概率）和我们提出的扩散模型损失（联合概率）。DBC 在导航、机械臂操作、灵巧操作和运动等多种连续控制任务中优于基线方法。我们设计了额外的实验来验证仅建模专家分布的条件概率或联合概率的局限性，并比较了不同的生成模型。消融研究证实了我们设计选择的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日