Partially observable Markov decision processes (POMDPs) rely on the key assumption that probability distributions are precisely known. Robust POMDPs (RPOMDPs) alleviate this concern by defining imprecise probabilities, referred to as uncertainty sets. While robust MDPs have been studied extensively, work on RPOMDPs is limited and primarily focuses on algorithmic solution methods. We expand the theoretical understanding of RPOMDPs by showing that 1) different assumptions on the uncertainty sets affect optimal policies and values; 2) RPOMDPs have a partially observable stochastic game (POSG) semantic; and 3) the same RPOMDP with different assumptions leads to semantically different POSGs and, thus, different policies and values. These novel semantics for RPOMDPs give access to results for POSGs, studied in game theory; concretely, we show the existence of a Nash equilibrium. Finally, we classify the existing RPOMDP literature using our semantics, clarifying under which uncertainty assumptions these existing works operate.
翻译:部分可观测马尔可夫决策过程(POMDP)依赖于概率分布精确已知这一关键假设。鲁棒POMDP(RPOMDP)通过定义非精确概率(称为不确定性集合)来缓解这一问题。尽管鲁棒MDP已得到广泛研究,但针对RPOMDP的工作有限,且主要集中于算法求解方法。我们通过证明以下三点拓展了对RPOMDP的理论理解:1)不确定性集合的不同假设会影响最优策略与值函数;2)RPOMDP具有部分可观测随机博弈(POSG)语义;3)同一RPOMDP在不同假设下会导致语义相异的POSG,进而产生不同的策略与值函数。这些RPOMDP的新语义使我们能够借鉴博弈论中已研究的POSG相关成果;具体而言,我们证明了纳什均衡的存在性。最后,我们运用所提出的语义对现有RPOMDP文献进行分类,从而明确了这些现有工作所基于的不确定性假设。