Autonomous agents operating in real-world scenarios frequently encounter uncertainty and make decisions based on incomplete information. Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs). However, finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks. In recent years, approximate algorithms, such as tree search and sample-based methodologies, have emerged as state-of-the-art POMDP solvers for larger problems. Despite their effectiveness, these algorithms offer only probabilistic and often asymptotic guarantees toward the optimal solution due to their dependence on sampling. To address these limitations, we derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one. First, we derive bounds for selecting a subset of the observations to branch from while computing a complete belief at each posterior node. Then, since a complete belief update may be computationally demanding, we extend the bounds to support reduction of both the state and the observation spaces. We demonstrate how our guarantees can be integrated with existing state-of-the-art solvers that sample a subset of states and observations. As a result, the returned solution holds deterministic bounds relative to the optimal policy. Lastly, we substantiate our findings with supporting experimental results.
翻译:在现实场景中运行的自主智能体常面临不确定性,并需基于不完整信息制定决策。部分可观测马尔可夫决策过程(POMDP)可对不确定性下的规划问题进行数学形式化表述。然而,POMDP最优策略的求解计算成本高昂,仅适用于小规模任务。近年来,树搜索与基于采样的方法等近似算法已成为求解大规模POMDP问题的主流方案。尽管这些算法效果显著,但由于其对采样的依赖性,仅能提供概率性的、通常为渐近性的最优解保证。为突破这一局限,我们推导出易于求解的简化方案与理论最优方案之间的确定性关系。首先,我们在计算每个后验节点的完整置信状态时,推导出用于选择分支观测子集的界限。其次,针对完整置信状态更新可能带来的高计算开销,我们将界限扩展至支持同时缩减状态空间与观测空间。我们展示了如何将这些保证机制集成至现有采用状态与观测子集采样的主流求解器中,使得返回的解相对于最优策略具有确定性边界。最后,通过实验验证了相关结论的有效性。