Imitation learning is a data-driven approach to learning policies from expert behavior, but it is prone to unreliable outcomes in out-of-sample (OOS) regions. While previous research relying on stable dynamical systems guarantees convergence to a desired state, it often overlooks transient behavior. We propose a framework for learning policies modeled by contractive dynamical systems, ensuring that all policy rollouts converge regardless of perturbations, and in turn, enable efficient OOS recovery. By leveraging recurrent equilibrium networks and coupling layers, the policy structure guarantees contractivity for any parameter choice, which facilitates unconstrained optimization. We also provide theoretical upper bounds for worst-case and expected loss to rigorously establish the reliability of our method in deployment. Empirically, we demonstrate substantial OOS performance improvements for simulated robotic manipulation and navigation tasks.
翻译:模仿学习是一种从专家行为中学习策略的数据驱动方法,但在样本外(OOS)区域容易产生不可靠的结果。以往基于稳定动态系统的研究虽能保证收敛到期望状态,却常忽略暂态行为。本文提出一种基于收缩动态系统建模的策略学习框架,确保所有策略轨迹在扰动下均能收敛,从而实现高效的OOS恢复。通过结合循环平衡网络与耦合层,该策略结构保证任意参数选择下的收缩性,从而支持无约束优化。我们进一步给出最坏情况损失与期望损失的理论上界,严格证明了该方法在部署中的可靠性。实验表明,该方法在模拟机器人操控与导航任务中实现了显著的OOS性能提升。