We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method. We do so by deriving the optimal robust Bellman operator in concrete forms using our $L_p$ water filling lemma. We unveil the exact form of the optimal policies, which turn out to be novel threshold policies with the probability of playing an action proportional to its advantage.
翻译:我们提出了一种高效的鲁棒值迭代算法,用于求解\texttt{s}-矩形鲁棒马尔可夫决策过程(MDPs),其时间复杂度与标准(非鲁棒)MDPs相当,且显著快于现有任何方法。我们通过利用$L_p$水填充引理,以具体形式推导出最优鲁棒贝尔曼算子。我们揭示了最优策略的精确形式,这些策略被证明是一种新颖的阈值策略,其中采取某个动作的概率与其优势成正比。