Partially observable Markov decision processes (POMDPs) are a principled planning model for sequential decision-making under uncertainty. Yet, real-world problems with high-dimensional observations, such as camera images, remain intractable for traditional belief- and filtering-based solvers. To tackle this problem, we introduce the Perception-based Beliefs for POMDPs framework (PBP), which complements such solvers with a perception model. This model takes the form of an image classifier which maps visual observations to probability distributions over states. PBP incorporates these distributions directly into belief updates, so the underlying solver does not need to reason explicitly over high-dimensional observation spaces. We show that the belief update of PBP coincides with the standard belief update if the image classifier is exact. Moreover, to handle classifier imprecision, we incorporate uncertainty quantification and introduce two methods to adjust the belief update accordingly. We implement PBP using two traditional POMDP solvers and empirically show that (1) it outperforms existing end-to-end deep RL methods and (2) uncertainty quantification improves robustness of PBP against visual corruption.
翻译:部分可观测马尔可夫决策过程(POMDP)是处理不确定性下序贯决策问题的原则性规划模型。然而,对于具有高维观测(例如相机图像)的现实问题,传统的基于信念与滤波的求解器仍难以处理。为解决此问题,我们提出了面向POMDP的感知信念框架(PBP),该框架通过感知模型对传统求解器进行补充。该感知模型采用图像分类器的形式,将视觉观测映射为状态概率分布。PBP将这些分布直接纳入信念更新过程,使得底层求解器无需显式地对高维观测空间进行推理。我们证明当图像分类器精确时,PBP的信念更新与标准信念更新完全一致。此外,为处理分类器的不精确性,我们引入不确定性量化技术,并提出两种相应调整信念更新的方法。我们使用两种传统POMDP求解器实现了PBP,并通过实验证明:(1)其性能优于现有端到端深度强化学习方法;(2)不确定性量化能有效提升PBP对视觉干扰的鲁棒性。