This paper is concerned with a finite-horizon inverse control problem, which has the goal of inferring, from observations, the possibly non-convex and non-stationary cost driving the actions of an agent. In this context, we present a result that enables cost estimation by solving an optimization problem that is convex even when the agent cost is not and when the underlying dynamics is nonlinear, non-stationary and stochastic. To obtain this result, we also study a finite-horizon forward control problem that has randomized policies as decision variables. For this problem, we give an explicit expression for the optimal solution. Moreover, we turn our findings into algorithmic procedures and we show the effectiveness of our approach via both in-silico and experimental validations with real hardware. All the experiments confirm the effectiveness of our approach.
翻译:本文研究有限时域逆控制问题,其目标是通过观测推断驱动智能体行为的可能非凸、非平稳代价函数。在此背景下,我们提出了一项研究成果,使得通过求解一个优化问题即可实现代价函数估计——即便智能体代价函数非凸、且底层动力学系统具有非线性、非平稳和随机特性,该优化问题仍保持凸性。为获得此结果,我们还研究了以随机策略为决策变量的有限时域前向控制问题,并给出了该问题最优解的显式表达式。进一步地,我们将理论发现转化为算法流程,并通过数值仿真与实物硬件实验验证了方法的有效性。所有实验均证实了本方法的优越性能。