Learning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to the optimization of an intrinsic cost function that reflects its intent and informs its control actions. While the framework is expressive, it is also computationally demanding and generally lacks convergence guarantees. We therefore propose a novel, stability-certified IRL approach by reformulating the cost function inference problem to learning control Lyapunov functions (CLF) from demonstrations data. By additionally exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs by observing the attractor landscape of the induced dynamics. For the construction of the inverse optimal CLFs, we use a Sum of Squares and formulate a convex optimization problem. We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world data.
翻译:从专家示范中学习以灵活编程具有复杂行为的自主系统,或预测智能体行为是一种强大的工具,尤其在协作控制场景中。解决此问题的常用方法是逆强化学习(IRL),该方法假设被观测智能体(例如人类示范者)的行为遵循某一内在代价函数的优化,该代价函数反映其意图并指导其控制动作。尽管逆强化学习框架具有高度表达力,但其计算复杂度高且通常缺乏收敛性保证。为此,我们提出一种新型的稳定性认证逆强化学习方法,将代价函数推断问题重新表述为从示范数据学习控制李雅普诺夫函数(CLF)。通过进一步利用关联控制策略的闭式表达式,我们能够通过观测诱导动力学的吸引子景观高效搜索CLF空间。为构建逆最优CLF,我们采用平方和方法并构建凸优化问题。我们从理论上分析了CLF提供的最优性保证,并使用仿真数据与真实数据评估了所提方法。