Diffusion-based imitation learning improves Behavioral Cloning (BC) on multi-modal decision-making, but comes at the cost of significantly slower inference due to the recursion in the diffusion process. It urges us to design efficient policy generators while keeping the ability to generate diverse actions. To address this challenge, we propose AdaFlow, an imitation learning framework based on flow-based generative modeling. AdaFlow represents the policy with state-conditioned ordinary differential equations (ODEs), which are known as probability flows. We reveal an intriguing connection between the conditional variance of their training loss and the discretization error of the ODEs. With this insight, we propose a variance-adaptive ODE solver that can adjust its step size in the inference stage, making AdaFlow an adaptive decision-maker, offering rapid inference without sacrificing diversity. Interestingly, it automatically reduces to a one-step generator when the action distribution is uni-modal. Our comprehensive empirical evaluation shows that AdaFlow achieves high performance with fast inference speed.
翻译:基于扩散模型的模仿学习在多模态决策任务中改进了行为克隆(BC),但由于扩散过程的递归特性,其推理速度显著降低。这促使我们设计高效策略生成器,同时保持生成多样化动作的能力。为应对这一挑战,我们提出AdaFlow——一个基于流生成建模的模仿学习框架。AdaFlow采用状态条件常微分方程(ODE,即概率流)表示策略。我们揭示了其训练损失的条件方差与ODE离散化误差之间的内在联系。基于这一发现,我们提出一种方差自适应ODE求解器,可在推理阶段动态调整步长,使AdaFlow成为自适应决策器,在保持多样性的同时实现快速推理。值得注意的是,当动作分布呈单模态时,该框架会自动退化为单步生成器。综合实验评估表明,AdaFlow在实现高性能的同时具备快速推理能力。