Imitation learning is a well-established approach for machine-learning-based control. However, its applicability depends on having access to demonstrations, which are often expensive to collect and/or suboptimal for solving the task. In this work, we present GCImOpt, an approach to learn efficient goal-conditioned policies by training on datasets generated by trajectory optimization. Our approach for dataset generation is computationally efficient, can generate thousands of optimal trajectories in minutes on a laptop computer, and produces high-quality demonstrations. Further, by means of a data augmentation scheme that treats intermediate states as goals, we are able to increase the training dataset size by an order of magnitude. Using our generated datasets, we train goal-conditioned neural network policies that can control the system towards arbitrary goals. To demonstrate the generality of our approach, we generate datasets and then train policies for various control tasks, namely cart-pole stabilization, planar and three-dimensional quadcopter stabilization, and point reaching using a 6-DoF robot arm. We show that our trained policies can achieve high success rates and near-optimal control profiles, all while being small (less than 80,000 neural network parameters) and fast enough (up to more than 6,000 times faster than a trajectory optimization solver) that they could be deployed onboard resource-constrained controllers. We provide videos, code, datasets and pre-trained policies under a free software license; see our project website https://jongoiko.github.io/gcimopt/.
翻译:模仿学习是一种基于机器学习控制的成熟方法。然而,其适用性取决于能否获得示范数据,而这类数据通常收集成本高昂且/或对解决问题而言非最优。本文提出GCImOpt方法,通过训练基于轨迹优化生成的数据集来学习高效的目标条件策略。我们的数据集生成方法计算高效,可在笔记本电脑上数分钟内生成数千条最优轨迹,并产生高质量的示范。此外,通过将中间状态视为目标的数据增强方案,我们能够将训练数据集规模提升一个数量级。利用生成的数据集,我们训练了可控制系统朝向任意目标的目标条件神经网络策略。为展示方法的通用性,我们针对多种控制任务生成数据集并训练策略,具体包括:推车-杆稳定、平面与三维四旋翼稳定,以及使用六自由度机械臂的点到达任务。结果表明,训练后的策略能够实现高成功率与近最优控制曲线,同时保持小型化(神经网络参数少于80,000个)与高速性(比轨迹优化求解器快6000倍以上),可部署于资源受限的控制器。我们在自由软件许可下提供视频、代码、数据集及预训练策略;详见项目网站 https://jongoiko.github.io/gcimopt/。