We provide faster deterministic and randomized algorithms for exactly solving discounted Markov Decision Processes (DMDPs). We obtain our results by efficiently reducing computing optimal values and policies in DMDPs to the easier tasks of policy evaluation and computing approximately optimal values in DMDPs. We provide both a straightforward deterministic reduction and a more efficient randomized variant that, together with advances in approximately solving DMDPs, yield our results.
翻译:我们提出了更快的确定性和随机化算法,用于精确求解折扣马尔可夫决策过程(DMDPs)。通过将DMDPs中最优值与策略的计算高效简化为策略评估和近似最优值计算等更简单的任务,我们获得了这些结果。我们同时提供了一种直接的确定性归约方法和一种更高效的随机化变体,结合DMDPs近似求解的最新进展,最终实现了上述成果。