Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to real-world tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.
翻译:现实世界的决策问题通常具有部分可观测性,其中许多问题可被形式化为部分可观测马尔可夫决策过程(POMDP)。当我们对POMDP应用强化学习算法时,对隐藏状态的合理估计有助于解决问题。此外,考虑到其在自动驾驶汽车等现实任务中的应用,可解释的决策过程更为可取。我们提出了一种强化学习算法,通过端到端训练估计隐藏状态,并将估计结果可视化为状态转移图。实验结果表明,该算法能解决简单的POMDP问题,且可视化结果使智能体的行为对人类而言具有可解释性。