In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounted for in learning, exploration and planning, but presents significant computational and statistical challenges. To address these difficulties, we develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations. We provide a theoretical analysis for justifying the statistical efficiency of the proposed algorithm, and also empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks, advancing reliable reinforcement learning towards more practical applications.
翻译:在大多数现实世界的强化学习应用中,状态信息仅部分可观测,这破坏了马尔可夫决策过程的假设,并导致将观测与状态混为一谈的算法性能低下。另一方面,部分可观测马尔可夫决策过程(POMDPs)提供了一个通用框架,允许在学习、探索和规划中考虑部分可观测性,但也带来了显著的计算和统计挑战。为了应对这些困难,我们发展了一种基于表征的视角,从而为从部分观测中进行实用强化学习提供了一个连贯的框架和可处理的算法途径。我们提供了理论分析以证明所提算法的统计效率,并通过实验证明该算法能在各种基准测试中超越最先进的性能,从而推动可靠的强化学习走向更实际的应用。