Inverse reinforcement learning~(IRL) is a powerful framework to infer an agent's reward function by observing its behavior, but IRL algorithms that learn point estimates of the reward function can be misleading because there may be several functions that describe an agent's behavior equally well. A Bayesian approach to IRL models a distribution over candidate reward functions, alleviating the shortcomings of learning a point estimate. However, several Bayesian IRL algorithms use a $Q$-value function in place of the likelihood function. The resulting posterior is computationally intensive to calculate, has few theoretical guarantees, and the $Q$-value function is often a poor approximation for the likelihood. We introduce kernel density Bayesian IRL (KD-BIRL), which uses conditional kernel density estimation to directly approximate the likelihood, providing an efficient framework that, with a modified reward function parameterization, is applicable to environments with complex and infinite state spaces. We demonstrate KD-BIRL's benefits through a series of experiments in Gridworld environments and a simulated sepsis treatment task.
翻译:逆强化学习(IRL)是一种通过观察智能体行为来推断其奖励函数的强大框架,但学习奖励函数点估计的IRL算法可能存在误导性,因为可能存在多个同样能描述智能体行为的函数。贝叶斯方法通过建立候选奖励函数的分布模型,缓解了点估计学习的缺陷。然而,多种贝叶斯IRL算法使用$Q$值函数替代似然函数,导致后验计算复杂度高、缺乏理论保证,且$Q$值函数通常难以有效近似似然函数。我们提出核密度贝叶斯逆强化学习(KD-BIRL),该方法通过条件核密度估计直接近似似然函数,构建了一个高效框架。结合修正的奖励函数参数化,该框架适用于具有复杂无限状态空间的环境。通过网格世界环境和模拟脓毒症治疗任务的系列实验,我们验证了KD-BIRL的优势。