Solving partially observable Markov decision processes (POMDPs) with high dimensional and continuous observations, such as camera images, is required for many real life robotics and planning problems. Recent researches suggested machine learned probabilistic models as observation models, but their use is currently too computationally expensive for online deployment. We deal with the question of what would be the implication of using simplified observation models for planning, while retaining formal guarantees on the quality of the solution. Our main contribution is a novel probabilistic bound based on a statistical total variation distance of the simplified model. We show that it bounds the theoretical POMDP value w.r.t. original model, from the empirical planned value with the simplified model, by generalizing recent results of particle-belief MDP concentration bounds. Our calculations can be separated into offline and online parts, and we arrive at formal guarantees without having to access the costly model at all during planning, which is also a novel result. Finally, we demonstrate in simulation how to integrate the bound into the routine of an existing continuous online POMDP solver.
翻译:求解具有高维连续观测(如相机图像)的部分可观测马尔可夫决策过程(POMDP)是许多真实机器人学与规划问题的关键需求。近年研究提出使用机器学习概率模型作为观测模型,但其在线部署的计算成本过高。我们探讨了在规划中采用简化观测模型时,如何保持对解质量的形式化保证这一核心问题。本文主要贡献在于提出一种基于简化模型统计总变差距离的新型概率界。通过推广近期粒子置信MDP浓度界结果,我们证明该概率界可约束基于简化模型规划得到的经验值相对于原始模型理论POMDP值的偏差。计算过程可分解为离线与在线两部分,从而在规划全程无需访问高成本模型即可获得形式化保证——该结论亦为原创性成果。最后,我们通过仿真展示了如何将该概率界集成至现有连续在线POMDP求解器的常规流程中。