Imitation learning presents an effective approach to alleviate the resource-intensive and time-consuming nature of policy learning from scratch in the solution space. Even though the resulting policy can mimic expert demonstrations reliably, it often lacks predictability in unexplored regions of the state-space, giving rise to significant safety concerns in the face of perturbations. To address these challenges, we introduce the Stable Neural Dynamical System (SNDS), an imitation learning regime which produces a policy with formal stability guarantees. We deploy a neural policy architecture that facilitates the representation of stability based on Lyapunov theorem, and jointly train the policy and its corresponding Lyapunov candidate to ensure global stability. We validate our approach by conducting extensive experiments in simulation and successfully deploying the trained policies on a real-world manipulator arm. The experimental results demonstrate that our method overcomes the instability, accuracy, and computational intensity problems associated with previous imitation learning methods, making our method a promising solution for stable policy learning in complex planning scenarios.
翻译:模仿学习提供了一种有效方法,可缓解在解空间中从头进行策略学习所需的资源密集性和耗时性。尽管所得到的策略能可靠地模仿专家演示,但在状态空间的未探索区域往往缺乏可预测性,从而在面临扰动时引发严重的安全问题。为应对这些挑战,我们提出稳定神经动力系统(SNDS),这是一种能产生具有形式化稳定性保证的策略的模仿学习范式。我们采用一种基于李雅普诺夫定理表征稳定性的神经策略架构,并联合训练策略及其对应的李雅普诺夫候选函数以确保全局稳定性。我们通过在仿真中开展广泛实验,并成功将训练策略部署到真实机械臂上来验证该方法。实验结果表明,我们的方法克服了先前模仿学习方法中存在的稳定性、精度和计算强度问题,使该方法成为复杂规划场景中稳定策略学习的有前景解决方案。