This paper investigates MDPs with intermittent state information. We consider a scenario where the controller perceives the state information of the process via an unreliable communication channel. The transmissions of state information over the whole time horizon are modeled as a Bernoulli lossy process. Hence, the problem is finding an optimal policy for selecting actions in the presence of state information losses. We first formulate the problem as a belief MDP to establish structural results. The effect of state information losses on the expected total discounted reward is studied systematically. Then, we reformulate the problem as a tree MDP whose state space is organized in a tree structure. Two finite-state approximations to the tree MDP are developed to find near-optimal policies efficiently. Finally, we put forth a nested value iteration algorithm for the finite-state approximations, which is proved to be faster than standard value iteration. Numerical results demonstrate the effectiveness of our methods.
翻译:本文研究了具有间歇状态信息的马尔可夫决策过程。我们考虑一个场景:控制器通过不可靠的通信信道感知过程的状态信息。整个时间范围内的状态信息传输被建模为伯努利丢包过程。因此,问题在于在存在状态信息丢失的情况下寻找选择动作的最优策略。我们首先将该问题建模为信念MDP以建立结构性结果,并系统地研究了状态信息丢失对期望总折扣奖励的影响。随后,我们将问题重新表述为状态空间以树结构组织的树MDP,并开发了两种有限状态近似方法以高效求解近优策略。最后,我们提出了一种针对有限状态近似的嵌套值迭代算法,该算法被证明比标准值迭代更快。数值结果验证了所提方法的有效性。