Diffusion Model-Augmented Behavioral Cloning

Imitation learning addresses the challenge of learning by observing an expert's demonstrations without access to reward signals from environments. Most existing imitation learning methods that do not require interacting with environments either model the expert distribution as the conditional probability p(a|s) (e.g., behavioral cloning, BC) or the joint probability p(s, a). Despite its simplicity, modeling the conditional probability with BC usually struggles with generalization. While modeling the joint probability can lead to improved generalization performance, the inference procedure is often time-consuming and the model can suffer from manifold overfitting. This work proposes an imitation learning framework that benefits from modeling both the conditional and joint probability of the expert distribution. Our proposed diffusion model-augmented behavioral cloning (DBC) employs a diffusion model trained to model expert behaviors and learns a policy to optimize both the BC loss (conditional) and our proposed diffusion model loss (joint). DBC outperforms baselines in various continuous control tasks in navigation, robot arm manipulation, dexterous manipulation, and locomotion. We design additional experiments to verify the limitations of modeling either the conditional probability or the joint probability of the expert distribution as well as compare different generative models. Ablation studies justify the effectiveness of our design choices.

翻译：模仿学习旨在通过观察专家的演示来学习行为，而无需从环境中获取奖励信号。现有的大多数无需与环境交互的模仿学习方法要么将专家分布建模为条件概率p(a|s)（例如行为克隆，BC），要么建模为联合概率p(s, a)。尽管行为克隆通过条件概率建模的方式简单直观，但其泛化能力通常较差。而联合概率建模虽能提升泛化性能，但推理过程往往耗时，且模型易受流形过拟合问题的影响。本文提出了一种通过同时建模专家分布的条件概率与联合概率来改进模仿学习的框架。我们提出的扩散模型增强行为克隆（DBC）利用扩散模型来建模专家行为，并学习一个策略以同时优化BC损失（条件概率）和所提出的扩散模型损失（联合概率）。在导航、机械臂操作、灵巧操作及运动控制等多种连续控制任务中，DBC均优于基线方法。我们设计了额外实验验证仅对专家分布的条件概率或联合概率建模的局限性，并比较了不同生成模型的性能。消融研究证实了我们设计选择的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日