Learning from demonstration (LfD) provides an efficient way to train robots. The learned motions should be convergent and stable, but to be truly effective in the real world, LfD-capable robots should also be able to remember multiple motion skills. Existing stable-LfD approaches lack the capability of multi-skill retention. Although recent work on continual-LfD has shown that hypernetwork-generated neural ordinary differential equation solvers (NODE) can learn multiple LfD tasks sequentially, this approach lacks stability guarantees. We propose an approach for stable continual-LfD in which a hypernetwork generates two networks: a trajectory learning dynamics model, and a trajectory stabilizing Lyapunov function. The introduction of stability generates convergent trajectories, but more importantly it also greatly improves continual learning performance, especially in the size-efficient chunked hypernetworks. With our approach, a single hypernetwork learns stable trajectories of the robot's end-effector position and orientation simultaneously, and does so continually for a sequence of real-world LfD tasks without retraining on past demonstrations. We also propose stochastic hypernetwork regularization with a single randomly sampled regularization term, which reduces the cumulative training time cost for N tasks from O$(N^2)$ to O$(N)$ without any loss in performance on real-world tasks. We empirically evaluate our approach on the popular LASA dataset, on high-dimensional extensions of LASA (including up to 32 dimensions) to assess scalability, and on a novel extended robotic task dataset (RoboTasks9) to assess real-world performance. In trajectory error metrics, stability metrics and continual learning metrics our approach performs favorably, compared to other baselines. Our open-source code and datasets are available at https://github.com/sayantanauddy/clfd-snode.
翻译:示教学习(LfD)为机器人训练提供了高效途径。习得动作应具备收敛性与稳定性,但要在真实世界中真正有效,具备LfD能力的机器人还需能记忆多种运动技能。现有稳定LfD方法缺乏多技能保留能力。尽管近期持续LfD研究表明,超网络生成的神经常微分方程求解器(NODE)能顺序学习多个LfD任务,但该方法缺乏稳定性保障。我们提出一种稳定持续LfD方法,其中超网络生成两个网络:轨迹学习动力学模型与轨迹稳定化李雅普诺夫函数。引入稳定性产生收敛轨迹,但更重要的是,其显著提升了持续学习性能,尤其在尺寸高效的分块超网络中。通过我们的方法,单个超网络可同时学习机器人末端执行器位置与方向的稳定轨迹,并能针对一系列真实世界LfD任务持续学习,无需在先前示教上重新训练。我们还提出基于单随机采样正则化项的随机超网络正则化方法,将N个任务的累积训练时间成本从O$(N^2)$降至O$(N)$,且不损失真实世界任务性能。我们在主流LASA数据集、高维LASA扩展(最高32维)评估可扩展性,以及新型扩展机器人任务数据集(RoboTasks9)评估真实世界性能上进行了实证评估。在轨迹误差指标、稳定性指标和持续学习指标上,我们的方法相比其他基线表现更优。开源代码与数据集见https://github.com/sayantanauddy/clfd-snode。