Partially observable Markov Decision Processes (POMDPs) are a standard model for agents making decisions in uncertain environments. Most work on POMDPs focuses on synthesizing strategies based on the available capabilities. However, system designers can often control an agent's observation capabilities, e.g. by placing or selecting sensors. This raises the question of how one should select an agent's sensors cost-effectively such that it achieves the desired goals. In this paper, we study the novel optimal observability problem OOP: Given a POMDP M, how should one change M's observation capabilities within a fixed budget such that its (minimal) expected reward remains below a given threshold? We show that the problem is undecidable in general and decidable when considering positional strategies only. We present two algorithms for a decidable fragment of the OOP: one based on optimal strategies of M's underlying Markov decision process and one based on parameter synthesis with SMT. We report promising results for variants of typical examples from the POMDP literature.
翻译:部分可观测马尔可夫决策过程(POMDP)是智能体在不确定环境中进行决策的标准模型。现有研究大多聚焦于基于现有观测能力合成决策策略。然而,系统设计者通常能够控制智能体的观测能力,例如通过部署或选择传感器。这引出了一个关键问题:如何以成本效益最优的方式选择智能体的传感器,使其能够达成既定目标?本文研究了一个新颖的最优可观测性问题 OOP:给定 POMDP 模型 M,如何在固定预算内调整 M 的观测能力,使其(最小)期望奖励保持在给定阈值以下?我们证明该问题在一般情况下是不可判定的,而仅考虑位置策略时可判定。针对 OOP 的可判定片段,我们提出了两种算法:一种基于 M 底层马尔可夫决策过程的最优策略,另一种基于 SMT 参数合成方法。我们在 POMDP 文献典型示例的变体上获得了具有前景的实验结果。