Decision-making under uncertainty is a critical aspect of many practical autonomous systems due to incomplete information. Partially Observable Markov Decision Processes (POMDPs) offer a mathematically principled framework for formulating decision-making problems under such conditions. However, finding an optimal solution for a POMDP is generally intractable. In recent years, there has been a significant progress of scaling approximate solvers from small to moderately sized problems, using online tree search solvers. Often, such approximate solvers are limited to probabilistic or asymptotic guarantees towards the optimal solution. In this paper, we derive a deterministic relationship for discrete POMDPs between an approximated and the optimal solution. We show that at any time, we can derive bounds that relate between the existing solution and the optimal one. We show that our derivations provide an avenue for a new set of algorithms and can be attached to existing algorithms that have a certain structure to provide them with deterministic guarantees with marginal computational overhead. In return, not only do we certify the solution quality, but we demonstrate that making a decision based on the deterministic guarantee may result in superior performance compared to the original algorithm without the deterministic certification.
翻译:由于信息不完整,不确定性下的决策是许多实际自主系统的关键方面。部分可观测马尔可夫决策过程(POMDP)为此类条件下的决策问题提供了一个数学上严谨的建模框架。然而,寻找POMDP的最优解通常难以处理。近年来,利用在线树搜索求解器,近似求解器的应用规模已从小型问题扩展至中等规模问题。这类近似求解器通常仅能提供关于最优解的概率性或渐近性保证。本文针对离散POMDP推导了近似解与最优解之间的确定性关系。我们证明在任何时刻,均可推导出现有解与最优解之间的界限关系。研究表明,我们的推导为新型算法设计提供了路径,并可附加于具有特定结构的现有算法,以边际计算开销为其提供确定性保证。这不仅实现了对解质量的认证,而且实验表明:基于确定性保证的决策可能产生优于原始无认证算法的性能表现。