State abstraction optimizes decision-making by ignoring irrelevant environmental information in reinforcement learning with rich observations. Nevertheless, recent approaches focus on adequate representational capacities resulting in essential information loss, affecting their performances on challenging tasks. In this article, we propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective. Specifically, an unsupervised, adaptive hierarchical state clustering method without requiring manual assistance is presented, and meanwhile, an optimal encoding tree is generated. On each non-root tree node, a new aggregation function and condition structural entropy are designed to achieve hierarchical state abstraction and compensate for sampling-induced essential information loss in state abstraction. Empirical evaluations on a visual gridworld domain and six continuous control benchmarks demonstrate that, compared with five SOTA state abstraction approaches, SISA significantly improves mean episode reward and sample efficiency up to 18.98 and 44.44%, respectively. Besides, we experimentally show that SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.
翻译:状态抽象通过忽略富含观测信息的强化学习中的无关环境信息,优化决策过程。然而,近期方法侧重于提升表征能力,导致关键信息丢失,影响其在复杂任务上的表现。本文提出一种基于结构信息原理的新型数学状态抽象框架SISA(Structural Information principles-based State Abstraction),从信息论视角出发。具体而言,我们提出一种无需人工干预的无监督自适应层次化状态聚类方法,同时生成最优编码树。在每个非根树节点上,设计新的聚合函数与条件结构熵,以实现层次化状态抽象并补偿采样导致的本质信息损失。在可视化网格世界域和六项连续控制基准上的实验表明,与五种最先进的状态抽象方法相比,SISA的平均回合奖励和样本效率分别提升高达18.98%和44.44%。此外,实验验证SISA是一种通用框架,可灵活集成不同表征学习目标以进一步提升性能。