Imitation learning is a data-driven approach to learning policies from expert behavior, but it is prone to unreliable outcomes in out-of-sample (OOS) regions. While previous research relying on stable dynamical systems guarantees convergence to a desired state, it often overlooks transient behavior. We propose a framework for learning policies using modeled by contractive dynamical systems, ensuring that all policy rollouts converge regardless of perturbations, and in turn, enable efficient OOS recovery. By leveraging recurrent equilibrium networks and coupling layers, the policy structure guarantees contractivity for any parameter choice, which facilitates unconstrained optimization. Furthermore, we provide theoretical upper bounds for worst-case and expected loss terms, rigorously establishing the reliability of our method in deployment. Empirically, we demonstrate substantial OOS performance improvements in robotics manipulation and navigation tasks in simulation.
翻译:模仿学习是一种从专家行为中学习策略的数据驱动方法,但在样本外区域容易产生不可靠的结果。以往基于稳定动态系统的研究虽能保证收敛至期望状态,却常忽略瞬态行为。本文提出一种利用收缩动态系统建模的策略学习框架,确保所有策略轨迹在扰动下均能收敛,从而实现高效的样本外恢复。通过结合循环平衡网络与耦合层,该策略结构保证任意参数选择下的收缩性,从而支持无约束优化。此外,我们给出了最坏情况损失与期望损失的理论上界,严格证明了该方法在部署中的可靠性。实验表明,在机器人操作与导航仿真任务中,该方法显著提升了样本外性能。