GCImOpt: Learning efficient goal-conditioned policies by imitating optimal trajectories

from arxiv, Accepted for publication at the 8th Annual Conference on Learning for Dynamics and Control (L4DC 2026). 16 pages (including appendix), 1 figure. For project website, see https://jongoiko.github.io/gcimopt/

Imitation learning is a well-established approach for machine-learning-based control. However, its applicability depends on having access to demonstrations, which are often expensive to collect and/or suboptimal for solving the task. In this work, we present GCImOpt, an approach to learn efficient goal-conditioned policies by training on datasets generated by trajectory optimization. Our approach for dataset generation is computationally efficient, can generate thousands of optimal trajectories in minutes on a laptop computer, and produces high-quality demonstrations. Further, by means of a data augmentation scheme that treats intermediate states as goals, we are able to increase the training dataset size by an order of magnitude. Using our generated datasets, we train goal-conditioned neural network policies that can control the system towards arbitrary goals. To demonstrate the generality of our approach, we generate datasets and then train policies for various control tasks, namely cart-pole stabilization, planar and three-dimensional quadcopter stabilization, and point reaching using a 6-DoF robot arm. We show that our trained policies can achieve high success rates and near-optimal control profiles, all while being small (less than 80,000 neural network parameters) and fast enough (up to more than 6,000 times faster than a trajectory optimization solver) that they could be deployed onboard resource-constrained controllers. We provide videos, code, datasets and pre-trained policies under a free software license; see our project website https://jongoiko.github.io/gcimopt/.

翻译：模仿学习是一种基于机器学习控制的成熟方法。然而，其适用性取决于能否获得示范数据，而这类数据通常收集成本高昂且/或对解决问题而言非最优。本文提出GCImOpt方法，通过训练基于轨迹优化生成的数据集来学习高效的目标条件策略。我们的数据集生成方法计算高效，可在笔记本电脑上数分钟内生成数千条最优轨迹，并产生高质量的示范。此外，通过将中间状态视为目标的数据增强方案，我们能够将训练数据集规模提升一个数量级。利用生成的数据集，我们训练了可控制系统朝向任意目标的目标条件神经网络策略。为展示方法的通用性，我们针对多种控制任务生成数据集并训练策略，具体包括：推车-杆稳定、平面与三维四旋翼稳定，以及使用六自由度机械臂的点到达任务。结果表明，训练后的策略能够实现高成功率与近最优控制曲线，同时保持小型化（神经网络参数少于80,000个）与高速性（比轨迹优化求解器快6000倍以上），可部署于资源受限的控制器。我们在自由软件许可下提供视频、代码、数据集及预训练策略；详见项目网站 https://jongoiko.github.io/gcimopt/。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

专知会员服务

26+阅读 · 2022年3月15日

【干货书】MLOps是什么？MLOps实战：操作机器学习模型，461页pdf

专知会员服务

121+阅读 · 2022年2月16日