Although Reinforcement Learning (RL) algorithms acquire sequential behavioral patterns through interactions with the environment, their effectiveness in noisy and high-dimensional scenarios typically relies on specific structural priors. In this paper, we propose a novel and general Structural Information principles-based framework for effective Decision-Making, namely SIDM, approached from an information-theoretic perspective. This paper presents a specific unsupervised partitioning method that forms vertex communities in the state and action spaces based on their feature similarities. An aggregation function, which utilizes structural entropy as the vertex weight, is devised within each community to obtain its embedding, thereby facilitating hierarchical state and action abstractions. By extracting abstract elements from historical trajectories, a directed, weighted, homogeneous transition graph is constructed. The minimization of this graph's high-dimensional entropy leads to the generation of an optimal encoding tree. An innovative two-layer skill-based learning mechanism is introduced to compute the common path entropy of each state transition as its identified probability, thereby obviating the requirement for expert knowledge. Moreover, SIDM can be flexibly incorporated into various single-agent and multi-agent RL algorithms, enhancing their performance. Finally, extensive evaluations on challenging benchmarks demonstrate that, compared with SOTA baselines, our framework significantly and consistently improves the policy's quality, stability, and efficiency up to 32.70%, 88.26%, and 64.86%, respectively.
翻译:尽管强化学习算法通过与环境的交互获取序列行为模式,但其在噪声高维场景中的有效性通常依赖于特定的结构先验。本文提出一种基于结构信息原理的新型通用高效决策框架SIDM,从信息论视角展开研究。该框架提出一种特定无监督划分方法,根据状态与动作空间的特征相似性构建顶点社群。在每个社群内设计以结构熵为顶点权重的聚合函数,通过获取社群嵌入实现层次化状态-动作抽象。通过从历史轨迹中提取抽象元素,构建有向加权同质转移图。对图的高维熵进行最小化处理,可生成最优编码树。创新性地引入双层技能学习机制,计算每条状态转移的公共路径熵作为其识别概率,从而消除对专家知识的依赖。此外,SIDM可灵活嵌入各类单智能体与多智能体强化学习算法中提升性能。在具有挑战性的基准测试中,相较于最先进基线,本框架在策略质量、稳定性和效率上分别获得最高达32.70%、88.26%和64.86%的显著且持续提升。