Partially Observable Markov Decision Processes (POMDPs) are a general and principled framework for motion planning under uncertainty. Despite tremendous improvement in the scalability of POMDP solvers, long-horizon POMDPs (e.g., $\geq15$ steps) remain difficult to solve. This paper proposes a new approximate online POMDP solver, called Reference-Based Online POMDP Planning via Rapid State Space Sampling (ROP-RaS3). ROP-RaS3 uses novel extremely fast sampling-based motion planning techniques to sample the state space and generate a diverse set of macro actions online which are then used to bias belief-space sampling and infer high-quality policies without requiring exhaustive enumeration of the action space -- a fundamental constraint for modern online POMDP solvers. ROP-RaS3 is evaluated on various long-horizon POMDPs, including on a problem with a planning horizon of more than 100 steps and a problem with a 15-dimensional state space that requires more than 20 look ahead steps. In all of these problems, ROP-RaS3 substantially outperforms other state-of-the-art methods by up to multiple folds.
翻译:部分可观测马尔可夫决策过程(POMDP)是处理不确定性下运动规划的通用原则性框架。尽管POMDP求解器的可扩展性已取得巨大进步,长时域POMDP(例如$\geq15$步)的求解仍然具有挑战性。本文提出了一种新型近似在线POMDP求解器——基于快速状态空间采样的参考型在线POMDP规划(ROP-RaS3)。ROP-RaS3采用创新的极速基于采样的运动规划技术,在线采样状态空间并生成多样化的宏动作集合,进而用于偏置置信空间采样并推导高质量策略,无需对动作空间进行穷举枚举——这是现代在线POMDP求解器的根本性约束。ROP-RaS3在多种长时域POMDP问题上进行了评估,包括规划时域超过100步的问题,以及需要超过20步前瞻的15维状态空间问题。在所有测试问题中,ROP-RaS3均显著优于其他最先进方法,性能提升达数倍之多。