Robust Markov decision processes (MDPs) have attracted significant interest due to their ability to protect MDPs from poor out-of-sample performance in the presence of ambiguity. In contrast to classical MDPs, which account for stochasticity by modeling the dynamics through a stochastic process with a known transition kernel, a robust MDP additionally accounts for ambiguity by optimizing against the most adverse transition kernel from an ambiguity set constructed via historical data. In this paper, we develop a unified solution framework for a broad class of robust MDPs with $s$-rectangular ambiguity sets, where the most adverse transition probabilities are considered independently for each state. Using our algorithms, we show that $s$-rectangular robust MDPs with $1$- and $2$-norm as well as $φ$-divergence ambiguity sets can be solved several orders of magnitude faster than with state-of-the-art commercial solvers, and often only a logarithmic factor slower than classical MDPs. We demonstrate the favorable scaling properties of our algorithms on a range of synthetically generated as well as standard benchmark instances.
翻译:鲁棒马尔可夫决策过程因其能够在存在模糊性的情况下保护MDP免受样本外性能不佳的影响而备受关注。与经典MDP通过已知转移核的随机过程建模动态性来处理随机性不同,鲁棒MDP还通过针对由历史数据构建的模糊集中最不利的转移核进行优化来处理模糊性。本文针对具有$s$-矩形模糊集的广泛鲁棒MDP类别开发了一个统一的求解框架,其中每个状态的最不利转移概率被独立考虑。利用我们的算法,我们证明了具有$1$-范数、$2$-范数以及$φ$-散度模糊集的$s$-矩形鲁棒MDP的求解速度比最先进的商业求解器快几个数量级,并且通常仅比经典MDP慢一个对数因子。我们在一系列合成生成以及标准基准实例上展示了我们算法良好的扩展性。