Imitation learning is a paradigm to address complex motion planning problems by learning a policy to imitate an expert's behavior. However, relying solely on the expert's data might lead to unsafe actions when the robot deviates from the demonstrated trajectories. Stability guarantees have previously been provided utilizing nonlinear dynamical systems, acting as high-level motion planners, in conjunction with the Lyapunov stability theorem. Yet, these methods are prone to inaccurate policies, high computational cost, sample inefficiency, or quasi stability when replicating complex and highly nonlinear trajectories. To mitigate this problem, we present an approach for learning a globally stable nonlinear dynamical system as a motion planning policy. We model the nonlinear dynamical system as a parametric polynomial and learn the polynomial's coefficients jointly with a Lyapunov candidate. To showcase its success, we compare our method against the state of the art in simulation and conduct real-world experiments with the Kinova Gen3 Lite manipulator arm. Our experiments demonstrate the sample efficiency and reproduction accuracy of our method for various expert trajectories, while remaining stable in the face of perturbations.
翻译:模仿学习是一种通过习得策略模仿专家行为来解决复杂运动规划问题的范式。然而,当机器人偏离演示轨迹时,仅依赖专家数据可能导致不安全行为。此前已有方法通过结合非线性动力系统(作为高层运动规划器)与李雅普诺夫稳定性定理来提供稳定性保证。然而,这些方法在复现复杂且高度非线性的轨迹时,容易产生不精确的策略、高昂的计算成本、低样本效率或准稳定性问题。为解决此问题,本文提出一种将全局稳定的非线性动力系统作为运动规划策略的方法。我们将非线性动力系统建模为参数化多项式,并将多项式系数与李雅普诺夫候选函数联合学习。为验证其有效性,我们在仿真环境中与现有最优方法进行对比,并在Kinova Gen3 Lite机械臂上开展真实世界实验。实验表明,该方法在处理多种专家轨迹时具有样本效率与复现精度,同时在扰动干扰下保持稳定。