A variety of control tasks such as inverse kinematics (IK), trajectory optimization (TO), and model predictive control (MPC) are commonly formulated as energy minimization problems. Numerical solutions to such problems are well-established. However, these are often too slow to be used directly in real-time applications. The alternative is to learn solution manifolds for control problems in an offline stage. Although this distillation process can be trivially formulated as a behavioral cloning (BC) problem in an imitation learning setting, our experiments highlight a number of significant shortcomings arising due to incompatible local minima, interpolation artifacts, and insufficient coverage of the state space. In this paper, we propose an alternative to BC that is efficient and numerically robust. We formulate the learning of solution manifolds as a minimization of the energy terms of a control objective integrated over the space of problems of interest. We minimize this energy integral with a novel method that combines Monte Carlo-inspired adaptive sampling strategies with the derivatives used to solve individual instances of the control task. We evaluate the performance of our formulation on a series of robotic control problems of increasing complexity, and we highlight its benefits through comparisons against traditional methods such as behavioral cloning and Dataset aggregation (Dagger).
翻译:各类控制任务,例如逆向运动学(IK)、轨迹优化(TO)和模型预测控制(MPC),通常被表述为能量最小化问题。此类问题的数值求解方法已相当成熟,然而这些方法往往因计算速度过慢而无法直接用于实时应用。替代方案是在离线阶段学习控制任务的解流形。尽管这一蒸馏过程可以简单地在模仿学习框架下建模为行为克隆(BC)问题,但我们的实验表明,该方法存在由不兼容的局部极小值、插值伪影以及状态空间覆盖不足导致的显著缺陷。本文提出一种高效且数值稳健的BC替代方案。我们将解流形的学习表述为在感兴趣问题空间上对控制目标能量项的积分最小化问题。通过结合受蒙特卡洛启发的自适应采样策略与用于求解单个控制任务实例的导数信息,我们提出了一种新颖的积分最小化方法。我们在一系列复杂度递增的机器人控制任务上评估了所提方法的性能,并通过与行为克隆和数据集聚合(Dagger)等传统方法的对比突显其优势。