The fixed-horizon constrained Markov Decision Process (C-MDP) is a well-known model for planning in stochastic environments under operating constraints. Chance-Constrained MDP (CC-MDP) is a variant that allows bounding the probability of constraint violation, which is desired in many safety-critical applications. CC-MDP can also model a class of MDPs, called Stochastic Shortest Path (SSP), under dead-ends, where there is a trade-off between the probability-to-goal and cost-to-goal. This work studies the structure of (C)C-MDP, particularly an important variant that involves local transition. In this variant, the state reachability exhibits a certain degree of locality and independence from the remaining states. More precisely, the number of states, at a given time, that share some reachable future states is always constant. (C)C-MDP under local transition is NP-Hard even for a planning horizon of two. In this work, we propose a fully polynomial-time approximation scheme for (C)C-MDP that computes (near) optimal deterministic policies. Such an algorithm is among the best approximation algorithm attainable in theory and gives insights into the approximability of constrained MDP and its variants.
翻译:固定时段的约束马尔可夫决策过程(C-MDP)是在操作约束下规划随机环境的经典模型。机会约束MDP(CC-MDP)是一种变体,允许限制违反约束的概率,这在许多安全关键应用中至关重要。CC-MDP还可用于建模一类存在死胡同的马尔可夫决策过程,称为随机最短路径(SSP),其中需权衡到达目标的概率与到达目标的代价。本文研究了(C)C-MDP的结构,特别关注涉及局部转移的重要变体。在此变体中,状态可达性表现出一定程度的局部性,并与其余状态独立。更精确地说,在给定时刻,共享某些可达未来状态的状态数量始终为常数。即使规划时段仅为两步,局部转移下的(C)C-MDP也属于NP难问题。本文提出了一种全多项式时间近似方案,用于(C)C-MDP,可计算(近乎)最优的确定性策略。该算法在理论上属于最优近似算法,为约束马尔可夫决策过程及其变体的可近似性提供了洞见。