Diffusion models have emerged as powerful generative models in the text-to-image domain. This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments. Human behaviour is stochastic and multimodal, with structured correlations between action dimensions. Meanwhile, standard modelling choices in behaviour cloning are limited in their expressiveness and may introduce bias into the cloned policy. We begin by pointing out the limitations of these choices. We then propose that diffusion models are an excellent fit for imitating human behaviour, since they learn an expressive distribution over the joint action space. We introduce several innovations to make diffusion models suitable for sequential environments; designing suitable architectures, investigating the role of guidance, and developing reliable sampling strategies. Experimentally, diffusion models closely match human demonstrations in a simulated robotic control task and a modern 3D gaming environment.
翻译:扩散模型已在文本到图像生成领域展现出强大的生成能力。本文研究其在序列环境中作为"观测到动作"模型以模仿人类行为的应用。人类行为具有随机性和多模态特性,且动作维度间存在结构化相关性。与此同时,行为克隆中的标准建模选择在表达能力上存在局限,可能为克隆策略引入偏差。我们首先指出这些选择存在的局限性,随后提出扩散模型是模仿人类行为的理想方案,因其能学习联合动作空间的表达性分布。为使扩散模型适用于序列环境,我们引入了多项创新:设计合适的网络架构、探索引导机制的作用、并发展可靠的采样策略。实验表明,扩散模型在模拟机器人控制任务与现代3D游戏环境中均能紧密匹配人类示范行为。