Markov decision processes (MDPs) are a standard model for sequential decision-making problems and are widely used across many scientific areas, including formal methods and artificial intelligence (AI). MDPs do, however, come with the restrictive assumption that the transition probabilities need to be precisely known. Robust MDPs (RMDPs) overcome this assumption by instead defining the transition probabilities to belong to some uncertainty set. We present a gentle survey on RMDPs, providing a tutorial covering their fundamentals. In particular, we discuss RMDP semantics and how to solve them by extending standard MDP methods such as value iteration and policy iteration. We also discuss how RMDPs relate to other models and how they are used in several contexts, including reinforcement learning and abstraction techniques. We conclude with some challenges for future work on RMDPs.
翻译:马尔可夫决策过程(MDPs)是序列决策问题的标准模型,广泛应用于包括形式化方法和人工智能(AI)在内的众多科学领域。然而,MDPs存在一个限制性假设,即转移概率必须精确已知。鲁棒马尔可夫决策过程(RMDPs)通过将转移概率定义为属于某个不确定性集合来克服这一假设。本文对RMDPs进行了一次温和的综述,提供了涵盖其基础知识的教程。具体而言,我们讨论了RMDP的语义,以及如何通过扩展标准MDP方法(如值迭代和策略迭代)来求解RMDPs。我们还探讨了RMDPs与其他模型的关系,以及它们在强化学习和抽象技术等多种场景中的应用。最后,我们提出了RMDPs未来研究面临的一些挑战。