Inverse optimal control methods can be used to characterize behavior in sequential decision-making tasks. Most existing work, however, requires the control signals to be known, or is limited to fully-observable or linear systems. This paper introduces a probabilistic approach to inverse optimal control for stochastic non-linear systems with missing control signals and partial observability that unifies existing approaches. By using an explicit model of the noise characteristics of the sensory and control systems of the agent in conjunction with local linearization techniques, we derive an approximate likelihood for the model parameters, which can be computed within a single forward pass. We evaluate our proposed method on stochastic and partially observable version of classic control tasks, a navigation task, and a manual reaching task. The proposed method has broad applicability, ranging from imitation learning to sensorimotor neuroscience.
翻译:逆最优控制方法可用于刻画序贯决策任务中的行为特征。然而,现有工作大多要求控制信号已知,或局限于完全可观测或线性系统。本文针对控制信号缺失且部分可观测的随机非线性系统,提出了一种统一现有方法的概率化逆最优控制框架。通过结合智能体感知与控制系统噪声特征的显式模型与局部线性化技术,我们推导出模型参数的近似似然函数,该函数可通过单次前向传播计算。我们在随机部分可观测的经典控制任务、导航任务及手动伸展任务上评估了所提方法。该方法具有广泛适用性,涵盖从模仿学习到感觉运动神经科学等领域。