Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior

We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. Our framework invokes low-level controllers - either learned or implicit in position-command control - to stabilize imitation around expert demonstrations. We show that with (a) a suitable low-level stability guarantee and (b) a powerful enough generative model as our imitation learner, pure supervised behavior cloning can generate trajectories matching the per-time step distribution of essentially arbitrary expert trajectories in an optimal transport cost. Our analysis relies on a stochastic continuity property of the learned policy we call "total variation continuity" (TVC). We then show that TVC can be ensured with minimal degradation of accuracy by combining a popular data-augmentation regimen with a novel algorithmic trick: adding augmentation noise at execution time. We instantiate our guarantees for policies parameterized by diffusion models and prove that if the learner accurately estimates the score of the (noise-augmented) expert policy, then the distribution of imitator trajectories is close to the demonstrator distribution in a natural optimal transport distance. Our analysis constructs intricate couplings between noise-augmented trajectories, a technique that may be of independent interest. We conclude by empirically validating our algorithmic recommendations, and discussing implications for future research directions for better behavior cloning with generative modeling.

翻译：我们提出了一个理论框架，用于研究利用生成式建模对复杂专家演示进行行为克隆。该框架引入低层控制器（无论是学习的还是隐含于位置指令控制中的）来稳定围绕专家演示的模仿过程。我们证明，在满足（a）合适的低层稳定性保证和（b）足够强大的生成式模型作为模仿学习器的条件下，纯监督式行为克隆能够生成轨迹，其每个时间步的分布与本质上任意专家轨迹的分布相匹配（以最优传输成本度量）。我们的分析依赖于所学策略的一种随机连续性属性，即“全变差连续性”（TVC）。进一步表明，通过将一种流行的数据增强方案与新颖的算法技巧——在执行时添加增强噪声——相结合，可以在最小化精度损失的前提下确保TVC属性。我们将此保证实例化到由扩散模型参数化的策略上，并证明：若学习器能准确估计（经噪声增强的）专家策略的得分函数，则模仿者轨迹的分布与演示者分布在自然的最优传输距离上相近。我们的分析构建了噪声增强轨迹之间的复杂耦合，该技术可能具有独立的研究价值。最后，我们通过实验验证了算法建议，并探讨了未来研究方向对生成式建模实现更优行为克隆的启示。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日