Identifying the most suitable variables to represent the state is a fundamental challenge in Reinforcement Learning (RL). These variables must efficiently capture the information necessary for making optimal decisions. In order to address this problem, in this paper, we introduce the Transfer Entropy Redundancy Criterion (TERC), an information-theoretic criterion, which determines if there is \textit{entropy transferred} from state variables to actions during training. We define an algorithm based on TERC that provably excludes variables from the state that have no effect on the final performance of the agent, resulting in more sample efficient learning. Experimental results show that this speed-up is present across three different algorithm classes (represented by tabular Q-learning, Actor-Critic, and Proximal Policy Optimization (PPO)) in a variety of environments. Furthermore, to highlight the differences between the proposed methodology and the current state-of-the-art feature selection approaches, we present a series of controlled experiments on synthetic data, before generalizing to real-world decision-making tasks. We also introduce a representation of the problem that compactly captures the transfer of information from state variables to actions as Bayesian networks.
翻译:识别最适合表示状态的变量是强化学习中的基本挑战。这些变量必须高效捕捉做出最优决策所需的信息。为解决此问题,本文引入信息论准则——传递熵冗余准则(Transfer Entropy Redundancy Criterion, TERC),用于判断训练过程中是否有从状态变量到动作的\textit{熵传递}。我们定义了一种基于TERC的算法,该算法可证明地排除对智能体最终性能无影响的状态变量,从而实现更高效的样本学习。实验结果表明,在多种环境下的三类不同算法(以表格型Q学习、Actor-Critic和近端策略优化为代表)中,该加速效果均存在。此外,为突出所提方法与当前最先进特征选择方法的差异,我们在合成数据上开展了一系列受控实验,随后推广至现实决策任务。我们还引入了一种基于贝叶斯网络的问题表示方法,以简洁刻画从状态变量到动作的信息传递过程。