A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix, and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.
翻译:确定性逆强化学习(IRL)问题在线实时求解的一个关键挑战在于解的多重性。非唯一性要求我们研究等效解的概念,即那些导致不同代价泛函但产生相同反馈矩阵的解,并研究向此类解的收敛性。尽管文献中已发展出能够收敛至等效解的离线算法,但目前尚缺乏能够处理非唯一性的在线实时方法。本文提出一种正则化历史堆栈观测器,能够收敛至IRL问题的近似等效解。研究提出了新的数据充分性条件以辅助分析,并通过仿真结果验证了所提方法的有效性。