Learning from demonstration (LfD) provides an efficient way to train robots. The learned motions should be convergent and stable, but to be truly effective in the real world, LfD-capable robots should also be able to remember multiple motion skills. Multi-skill retention is a capability missing from existing stable-LfD approaches. On the other hand, recent work on continual-LfD has shown that hypernetwork-generated neural ordinary differential equation solvers, can learn multiple LfD tasks sequentially, but this approach lacks stability guarantees. We propose an approach for stable continual-LfD in which a hypernetwork generates two networks: a trajectory learning dynamics model, and a trajectory stabilizing Lyapunov function. The introduction of stability not only generates stable trajectories but also greatly improves continual learning performance, especially in the size-efficient chunked hypernetworks. With our approach, we can continually train a single model to predict the position and orientation trajectories of the robot's end-effector simultaneously for multiple real world tasks without retraining on past demonstrations. We also propose stochastic regularization with a single randomly sampled regularization term in hypernetworks, which reduces the cumulative training time cost for $N$ tasks from $\mathcal{O}(N^2)$ to $\mathcal{O}(N)$ without any loss in performance in real-world tasks. We empirically evaluate our approach on the popular LASA dataset, on high-dimensional extensions of LASA (including up to 32 dimensions) to assess scalability, and on a novel extended robotic task dataset (RoboTasks9) to assess real-world performance. In trajectory error metrics, stability metrics and continual learning metrics our approach performs favorably, compared to other baselines. Code and datasets will be shared after submission.
翻译:从演示中学习(LfD)为训练机器人提供了高效途径。学习到的运动应具备收敛性与稳定性,但要在现实世界中真正有效,具备LfD能力的机器人还应能记忆多种运动技能。现有稳定LfD方法缺失多技能保留能力。另一方面,近期持续LfD研究表明,超网络生成的神经常微分方程求解器虽能顺序学习多个LfD任务,但该方法缺乏稳定性保证。我们提出一种稳定的持续LfD方法,其中超网络生成两个网络:轨迹学习动力学模型与轨迹稳定李雅普诺夫函数。引入稳定性不仅生成稳定轨迹,还大幅提升持续学习性能,尤其在尺寸高效的块状超网络中。利用该方法,我们可连续训练单个模型,同时预测机器人末端执行器在多个现实任务中的位置与姿态轨迹,且无需对先前演示进行重训练。我们还提出针对超网络的随机正则化方法,采用单一随机采样正则化项,将N个任务的累积训练时间复杂度从O(N²)降至O(N),且不损失现实任务性能。我们在主流LASA数据集、LASA的高维扩展(最高32维)以评估可扩展性,以及新型扩展机器人任务数据集RoboTasks9上评估现实性能。与基线方法相比,本方法在轨迹误差指标、稳定性指标和持续学习指标上均表现更优。代码与数据集将在投稿后公开。